What are the minimum hardware requirements?

You need a GPU with 12GB+ VRAM (tested on AMD RX 9070 XT, 16GB), 32GB RAM recommended, approximately 35GB disk space for models, and Linux or Windows 10/11.

Which models should I use for legal work?

Three models are recommended. Gemma3:12b (8GB VRAM, 50 tok/s) for fast work like summaries and contract extraction. Qwen3:14b (9GB VRAM, 40 tok/s) for structured output like comparisons and outlines. Mistral-small:24b (14GB VRAM, 25 tok/s) for strongest reasoning on complex analysis.

What happens if I disconnect the internet after setup?

After the initial model download, you can disconnect from the internet entirely and everything continues to work. Both Ollama and Open WebUI bind to 127.0.0.1 (localhost only) and require no external connections to function.

What is the expected quality of AI-generated legal work?

Expect the quality of work from a sharp paralegal, not a senior partner. Final-quality legal writing still requires significant editing. Don't trust the output on math calculations or without verification against authoritative sources.

Private AI for Legal Work

Q: Do local AI models fabricate citations?

Yes, local models will confidently cite cases that don't exist with fake volume numbers, page numbers, and holdings. Never include AI-generated citations in any filing without verifying them in Westlaw or Lexis.

Q: Can local models handle large documents?

Local models handle roughly 12,000-16,000 words at a time. A 5-page contract fits easily. A 50-page deposition needs to be broken into sections.

Running large language models locally eliminates the three core privacy risks of cloud-based AI in legal work: vendor data logging, subpoena exposure through vendor records, and lack of control over data retention and jurisdiction.

Cloud AI services create confidentiality liabilities for lawyers. When you send client communications through OpenAI, Claude, or Gemini APIs, you grant the vendor the right to retain conversation history, potentially train on your data, and comply with subpoenas for your account records. In discovery, opposing counsel can demand all your interactions with the vendor as business records. Local deployment (Ollama on your own GPU) eliminates these risks: no API keys, no cloud logs, no third-party access, and no subpoena exposure.

This guide covers setup, hardware, and model selection for running private AI on AMD or NVIDIA GPUs. Everything runs offline after initial model download, with open-source tools (Ollama, Open WebUI) and no licensing restrictions.

What It Is

Private AI for legal work means running language models locally on your machine instead of sending client data through cloud APIs. This setup keeps confidential information entirely under your control.

How it works: Ollama runs AI models directly on your GPU. Open WebUI provides a ChatGPT-like interface in your browser, bound to localhost only. After downloading models from the internet once, the system works completely offline with no internet connection needed.

Why local AI for lawyers: Cloud-based AI services create three risks that local models eliminate. First, they log conversation history and may use your data to train future models. Second, your confidential client communications could be subpoenaed as business records of the AI vendor. Third, you have no control over data retention, jurisdiction, or security practices. Local deployment removes all three: no API keys, no cloud logs, no third-party access.

Setup scripts for Linux and Windows are on GitHub.

Hardware

Any GPU with 12GB+ VRAM (tested on AMD RX 9070 XT, 16GB)
32GB RAM recommended
~35GB disk space for models
Linux or Windows 10/11

Models

Three models, each filling a different role:

gemma3:12b (8GB VRAM, ~50 tok/s) — Fast. Summaries, first drafts, contract extraction, plain language translation. Handles 80% of daily work.
qwen3:14b (9GB VRAM, ~40 tok/s) — Structured output. Contract comparisons, argument outlines, checklists, fact pattern analysis.
mistral-small:24b (14GB VRAM, ~25 tok/s) — Strongest reasoning. Complex analysis, detailed memos, evaluating legal positions. Use when the smaller models aren't enough.

Use Cases

Summarizing depositions and transcripts
Drafting memos and correspondence
Extracting key terms, dates, and obligations from contracts
Comparing contract versions side-by-side
Building argument outlines
Generating due diligence and compliance checklists
Translating legalese to plain language for clients
First-pass review to flag issues for deeper analysis

Limitations

They fabricate citations. Local models will confidently cite cases that don't exist — complete with fake volume numbers, page numbers, and holdings. Never include an AI-generated citation in any filing without verifying it in Westlaw or Lexis.

They misstate legal standards. A model may describe a legal test incorrectly or mix elements from different tests. The output reads fluently but can be wrong. Always verify against authoritative sources.

Context window is limited. These models handle roughly 12,000-16,000 words at a time. A 5-page contract fits easily. A 50-page deposition needs to be broken into sections.

They can't do legal research. No access to case databases. They will fabricate sources if asked. Use them for structure and drafting, not for legal authority.

They're not GPT-4 or Claude. Expect the quality of a sharp paralegal, not a senior partner. Final-quality legal writing still needs significant editing. Don't trust the math on damages calculations.

Privacy

Both services bind to 127.0.0.1 (localhost only) — not accessible from other machines on your network. After the initial model download, you can disconnect from the internet entirely and everything continues to work.

The setup scripts, model recommendations, and full documentation are on GitHub.

Private AI for Legal Work

What It Is

Hardware

Models

Use Cases

Limitations

Privacy

Further Reading

Related Articles

Building a Fashion Trend Intelligence Pipeline for $3/Month

Local Model Shootout: Finding the Right LLM for Every Task

Zero-Cost Automation: 16 Tasks on a Local GPU

Sources

About the Author

Vache Sarkissian