Vache prompts. Claude codes.How it works

Private AI for Legal Work

·4 min read·by Vache Sarkissian
Updated June 3, 2026
·
Reviewed March 30, 2026
ailocal-gpuollamaprivacylegal
📚Top of Funnel

Written by Claude (Opus 4.6) Vache prompted, reviewed, and published. The data and benchmarks are real; the prose is AI-generated.

Running large language models locally eliminates the three core privacy risks of cloud-based AI in legal work: vendor data logging, subpoena exposure through vendor records, and lack of control over data retention and jurisdiction.

Cloud AI services create confidentiality liabilities for lawyers. When you send client communications through OpenAI, Claude, or Gemini APIs, you grant the vendor the right to retain conversation history, potentially train on your data, and comply with subpoenas for your account records. In discovery, opposing counsel can demand all your interactions with the vendor as business records. Local deployment (Ollama on your own GPU) eliminates these risks: no API keys, no cloud logs, no third-party access, and no subpoena exposure.

This guide covers setup, hardware, and model selection for running private AI on AMD or NVIDIA GPUs. Everything runs offline after initial model download, with open-source tools (Ollama, Open WebUI) and no licensing restrictions.

What It Is

Private AI for legal work means running language models locally on your machine instead of sending client data through cloud APIs. This setup keeps confidential information entirely under your control.

How it works: Ollama runs AI models directly on your GPU. Open WebUI provides a ChatGPT-like interface in your browser, bound to localhost only. After downloading models from the internet once, the system works completely offline with no internet connection needed.

Why local AI for lawyers: Cloud-based AI services create three risks that local models eliminate. First, they log conversation history and may use your data to train future models. Second, your confidential client communications could be subpoenaed as business records of the AI vendor. Third, you have no control over data retention, jurisdiction, or security practices. Local deployment removes all three: no API keys, no cloud logs, no third-party access.

Setup scripts for Linux and Windows are on GitHub.

Hardware

  • Any GPU with 12GB+ VRAM (tested on AMD RX 9070 XT, 16GB)
  • 32GB RAM recommended
  • ~35GB disk space for models
  • Linux or Windows 10/11

Models

Three models, each filling a different role:

  • gemma3:12b (8GB VRAM, ~50 tok/s) — Fast. Summaries, first drafts, contract extraction, plain language translation. Handles 80% of daily work.
  • qwen3:14b (9GB VRAM, ~40 tok/s) — Structured output. Contract comparisons, argument outlines, checklists, fact pattern analysis.
  • mistral-small:24b (14GB VRAM, ~25 tok/s) — Strongest reasoning. Complex analysis, detailed memos, evaluating legal positions. Use when the smaller models aren't enough.

Use Cases

  • Summarizing depositions and transcripts
  • Drafting memos and correspondence
  • Extracting key terms, dates, and obligations from contracts
  • Comparing contract versions side-by-side
  • Building argument outlines
  • Generating due diligence and compliance checklists
  • Translating legalese to plain language for clients
  • First-pass review to flag issues for deeper analysis

Limitations

They fabricate citations. Local models will confidently cite cases that don't exist — complete with fake volume numbers, page numbers, and holdings. Never include an AI-generated citation in any filing without verifying it in Westlaw or Lexis.

They misstate legal standards. A model may describe a legal test incorrectly or mix elements from different tests. The output reads fluently but can be wrong. Always verify against authoritative sources.

Context window is limited. These models handle roughly 12,000-16,000 words at a time. A 5-page contract fits easily. A 50-page deposition needs to be broken into sections.

They can't do legal research. No access to case databases. They will fabricate sources if asked. Use them for structure and drafting, not for legal authority.

They're not GPT-4 or Claude. Expect the quality of a sharp paralegal, not a senior partner. Final-quality legal writing still needs significant editing. Don't trust the math on damages calculations.

Privacy

Both services bind to 127.0.0.1 (localhost only) — not accessible from other machines on your network. After the initial model download, you can disconnect from the internet entirely and everything continues to work.

The setup scripts, model recommendations, and full documentation are on GitHub.

Further Reading

Sources

About the Author

Vache Sarkissian

Building research infrastructure and products at the intersection of knowledge systems and machine learning. Creator of Linesheet Pro, vault-search, and the vachsark learning engine.

View Full Bio →
© 2026 Vache Sarkissian·Built with Claude Code
vachsark.com