Product

Show HN: Turn a Folder of PDFs Into a Live JSON API

YY Yonas Yeneneh May 29, 2026 5 min read

Hi HN — I'm Yonas. I built ParseApi solo over the last few months and shipped it a few days ago.

The problem I kept hitting: I had a pile of PDFs — invoices, receipts, contracts — and I wanted the data in them as structured JSON. Every time, the same pipeline: OCR or LLM call, define the schema, handle retries, store the result, build an endpoint to serve it. Different shape, same plumbing. So I tried to collapse the whole thing into one step.

The folder is the unit, not the schema or the document.

Create a folder (e.g. invoices)
Drag PDFs into it — they parse in real time
That folder is now a live endpoint:

GET https://api.parseapi.dev/v1/{username}/invoices

Paste the URL into your app. That's the whole onboarding.

No schema definition step. When you drop the first few documents in a folder, it infers a schema from them and then conforms every later upload to that schema. You can edit the schema afterward, and optionally re-run past extractions against the new version.

The piece I care most about is editable extractions. AI extraction is never perfect on the long tail of real-world documents. So the original model output is stored immutably, and corrections are tracked separately as field-level diffs — who changed what, when, why. The API returns the corrected value by default; ?include_raw=true returns both. Wrong extractions are one click from being fixed by the user, and over time those corrections become real feedback signal. This part felt missing from every existing document-AI tool I tried.

A few other things that are in:

Per-folder auth: public, API key, JWT, or basic — your choice per folder, not platform-wide
Pluggable AI providers: Anthropic, OpenAI, Gemini, Ollama, anything OpenAI-compatible, with a fallback chain and a per-request cost cap
Auto-generated OpenAPI spec per folder, with browsable docs
Webhooks for extraction events, plus live status via SignalR
Source highlighting: click a JSON field, see the bounding box in the doc

Stack, for the curious — and because the choices were deliberate:

Pure .NET 9. No Node, no React, no Next.js. Razor Pages + HTMX + Alpine.js for the UI. Tailwind via the standalone CLI, no npm in the build chain.
Modular monolith, single deployable on Render. Subdomains (app. / api. / admin.) route to areas within one app.
Postgres (schema-isolated so it can share a database with other projects), Hangfire for background jobs (in-process), Cloudflare R2 for object storage, Stripe for billing.

The "in pure .NET" choice was contrarian on purpose. The default playbook for an AI-heavy SaaS is Next.js + Python + a managed everything stack. I wanted to see whether a one-person shop could ship something competitive without the JavaScript ecosystem on the server. Three months in, I'm convinced the answer is yes — and I'll probably write that up properly once the product has more reps on it.

I also spent the two weeks before launch building a kill-switch system into the app — feature flags, automated tripwires that cap AI spend, cooldowns, fail-closed defaults. I wrote about that in Building a kill switch before letting anyone use my SaaS if anyone's interested in that piece specifically.

Honest limitations

It's early. Genuinely. I have a handful of signups so far and no one has uploaded a real document yet.
Extraction quality depends on which model the router picks, and the routing logic is conservative right now (Haiku / 4o-mini by default to keep costs sane).
Bounding-box accuracy varies per provider — it's good on cleanly-laid-out PDFs and weaker on multi-column or scanned documents.
Schema inference works well on the document types I've tested most (invoices, receipts, basic forms). Long-tail document types will surface bugs.

What I'd love feedback on: what your weirdest extraction case is. The documents that broke other tools. That's the stuff I learn from.

Free tier is 100 pages/month, 2 folders, no credit card. Built solo, run solo, and I'll be in the thread answering everything.

Try it at app.parseapi.dev

All posts Start free

Try ParseApi free

100 pages per month at no cost. No credit card required.

Get started

Show HN: Turn a Folder of PDFs Into a Live JSON API

Honest limitations

More from the blog

Building a Kill Switch Before Letting Anyone Use My SaaS

Which AI Model Is Best for Document Extraction? We Ran the Numbers.

Building a Lightweight ATS with ParseApi: Resume Parsing Without the Lock-in

Try ParseApi free