On February 9, HHS released the most granular public Medicaid spending dataset ever: 227 million rows of provider-level claims data from 2018-2024. Every outpatient and professional claim, every state, every provider, every procedure code. A 2.9 GB Parquet file covering $1.09 trillion in public spending.
Nobody had built a good public interface for it. So I did.
medicare.vachsark.com — 5,500+ static pages, 54 state dashboards, 5,000 provider profiles, 500 procedure breakdowns. Zero hosting cost (pre-computed JSON on Vercel CDN).
But the interesting part isn't the website. It's what the data shows.
The Biggest Line Item Isn't What You'd Expect
Medicaid's largest expense is not hospitals, emergency rooms, or doctor visits. It's home-based personal care — aides helping people with daily living activities, billed in 15-minute increments.
A single procedure code — T1019 — accounts for $122.7 billion. That's 11.2% of all Medicaid spending on one line item.
| Category | Total (2018-2024) | Share |
|---|---|---|
| Home/community care | $230.3B | 21.1% |
| Office visits | $71.3B | 6.5% |
| Emergency department | $55.4B | 5.1% |
| Mental health/therapy | $27.8B | 2.5% |
Home care is 4x larger than ER spending. This is the result of 25 years of deliberate policy to move care out of institutions and into homes. But the system delivering this care has structural problems the data makes visible.
Spending Concentration Is Extreme
The Gini coefficient across Medicaid providers is 0.928 — higher than global wealth inequality (~0.85).
| Tier | Providers | Share of Spending |
|---|---|---|
| Top 1% | 6,175 | 53.2% |
| Top 5% | 30,875 | 80.1% |
| Top 10% | 61,750 | 89.7% |
| Bottom 80% | 494,003 | 3.8% |
The mean provider receives $1.77M. The median receives $47,823. A 37x gap.
The top five recipients aren't hospitals. They're fiscal intermediaries and government agencies:
| Provider | State | Total Paid |
|---|---|---|
| Public Partnerships LLC | NY | $7.18B |
| LA County Dept of Mental Health | CA | $6.78B |
| Tempus Unlimited | MA | $5.57B |
| Modivcare Solutions | CO | $3.09B |
| Freedom Care LLC | NY | $3.03B |
These entities process billions in public funds. The money flows through them to caregivers and services. The question is how much stays as overhead versus reaching the workforce.
The 2024 Cliff
2024 spending dropped 6.9% as Medicaid redeterminations resumed after the pandemic pause ("the unwinding"). But the decline wasn't uniform:
| Category | 2024 YoY Change |
|---|---|
| Home/community care | +1.7% |
| Habilitation/rehab | -4.0% |
| Mental health | -9.2% |
| Office visits | -14.2% |
| ER visits | -18.5% |
Claims fell 14% but spending only fell 7%. Cost per claim rose from $50.91 (2018) to $66.62 (2024) — a 31% increase. The system is getting more expensive per unit of service even as it covers fewer people.
Home care was insulated from the unwinding. The pandemic accelerated a structural shift toward home-based care that persisted.
The Fee-for-Service Caveat
Here's the critical context most analyses miss: this dataset covers fee-for-service (FFS) claims only. About 85% of Medicaid enrollees nationally are in managed care — their claims go through MCOs, not the state FFS system.
This means the $1.09 trillion figure drastically understates total Medicaid spending. States like Tennessee (100% managed care) and Texas (~90%) show artificially low numbers. New York's $144.8B figure is closer to reality because it maintains a larger FFS population.
I integrated supplementary data — CMS enrollment figures, managed care penetration rates, and ACA expansion status — into each state page so users can contextualize the FFS numbers. California's page, for example, shows that 94% of its enrollees are in managed care, meaning the FFS data represents roughly 6% of actual spending.
The Technical Build
The architecture is intentionally simple: pre-computed JSON + static Next.js on Vercel.
The data changes annually — there's no need for a server. A Python pipeline (DuckDB + Parquet) pre-computes ~22,000 JSON files in 15 minutes:
[ DuckDB + 2.9 GB Parquet ]
↓ scripts/build_web_data.py
[ 22,000 JSON files, 114 MB ]
↓ Next.js static export
[ 5,566 HTML pages ]
↓ Vercel CDN
[ Zero hosting cost ]
DuckDB reads the Parquet file natively — no import step, no database server. Sub-second queries on 227 million rows on a laptop. Enrichment data (NPPES for provider names, HCPCS for procedure descriptions, NUCC for taxonomy) joins in as separate Parquet files.
Next.js 16 with static export generates all 5,566 pages at build time. Each state, provider, and procedure page is pure HTML + JSON — no API calls at runtime.
Recharts for interactive charts, react-simple-maps for the state choropleth, Fuse.js for client-side provider search across 617K providers (sharded by state to keep bundles reasonable).
The whole thing deploys with npx vercel --prod --archive=tgz (the --archive flag is required because 51K+ files exceed Vercel's default limit).
State-Level Variation
Same federal program, fundamentally different implementations:
| State | Total Spending | Cost/Claim | Managed Care % |
|---|---|---|---|
| New York | $144.8B | $86.04 | 76% |
| California | $129.4B | $50.83 | 94% |
| Texas | $56.2B | $50.50 | 90% |
| Florida | $34.0B | $36.18 | 77% |
New York spends 2.4x per claim compared to Florida. Each of the 54 state pages on the site shows this comparison in context — enrollment trends, managed care penetration, ACA expansion status, and how the state ranks nationally.
What Could Be Done
The data suggests a few structural interventions:
Cap intermediary overhead. $230B flows through home care, but caregivers earn $14-17/hour with 40-60% annual turnover. Require fiscal intermediaries to disclose their administrative take-rate. Mandate that 80%+ of payments reach direct care workers — modeled on the ACA's medical loss ratio for insurers.
Make this data quarterly. This was a one-time release. Medicare publishes equivalent provider-level data regularly, enabling thousands of research papers and policy reforms. Medicaid — which spends more total dollars — has had no equivalent until now.
Address the 2.4x cost-per-claim gap. A federal matching incentive for states below a cost-of-living-adjusted reimbursement threshold would reduce access deserts without unfunded mandates. The ACA expansion used the same mechanism successfully.
Explore It Yourself
The full platform is live at medicare.vachsark.com. Every chart is interactive, every table is sortable, every provider links to their profile.
The raw HHS dataset is free to download from opendata.hhs.gov. Parquet format + DuckDB = sub-5-second queries on 227 million rows on any modern laptop.
If you're a researcher, journalist, or policymaker working on Medicaid — this is the most comprehensive public view of where the money goes at the provider level. Use it.