Lemma Case Study | Multi-Agent Architecture

Architecture

Three runtime planes

API routes stay deliberately fast; all LLM work happens in a durable background pipeline; progress reaches the browser through real-time events with polling as a fallback. Each plane can fail independently without taking down the others.

① Synchronous Plane Next.js 14 App Router API routes · Vercel

POST /api/upload PDF → Cloudflare R2

POST /…/analyze sets status = PROCESSING, fires Inngest event · rate-limited 5/user/hr (Upstash Redis)

GET /…/status lightweight polling endpoint

POST /…/export render deck → PDF / PPTX / DOCX → R2

event: paper/uploaded

▼

② Asynchronous Plane — Durable Execution Inngest · one function, each agent in its own step.run()

Agent Pipeline 5 agents + 3 critique passes, sequential steps with 3× retry — a failed step retries without re-running earlier (expensive) agents

Gemini (LLM) direct REST generateContent · per-agent model + thinking config with same-provider fallback

Tavily (search) live web retrieval for market evidence, behind a swappable search interface

Neon Postgres Prisma 7 · one schema-validated model per agent output

channel: project-{id}

▼

③ Notification Plane either path works alone — Pusher is optional at runtime

Pusher events real-time stage updates pushed to the browser

Status polling browser polls the lightweight /status endpoint as fallback

Supporting services

Clerk auth — middleware gates the workspace

Cloudflare R2 PDFs in, deck exports out (S3 SDK)

Upstash Redis per-user rate limiting

Zod every agent output validated before persistence

Puppeteer + pptxgenjs + docx deterministic deck rendering — no LLM in the export path

Agent Workflow

The analysis pipeline, step by step

Each agent's output is validated against a Zod schema before persistence — and the same schema is converted to a Gemini responseSchema, so the model is constrained at generation time and checked at parse time. Amber markers are adversarial critique passes.

UPLOAD PDF → R2 → Inngest event in: PDF · out: paper/uploaded

The analyze endpoint only flips status to PROCESSING and fires the event — no LLM work on the request path, so the API responds in milliseconds.

AGENT 1 Paper Analyst in: PDF · out: PaperData

Extracts abstract, novelty, domain, and key claims with confidence levels. Rejects non-research documents (financial reports, textbooks) outright.

CRITIQUE Skeptical Review in: PaperData + PDF · out: findings

An adversarial agent audits the analysis against the source PDF.

⚠ CRITICAL findings trigger exactly one regeneration of Agent 1 — bounded by design so critique loops can't run away. An unusable critique falls back to the original analysis.

AGENT 2 TRL/IRL Scorer in: PaperData only · out: TrlIrlData

Scores Technology and Investment Readiness Levels, suggests a commercialization pathway (spin-off / licensing / partnership), and flags risks.

✓ Deliberately never re-reads the PDF — it consumes Agent 1's structured output, keeping the context small and the reasoning auditable.

AGENT 3 Market Scout — retrieval, then synthesis in: domain + claims · out: MarketData + sources

Stage 1 is retrieval-only: Tavily searches for competitors, funding signals, patents, and market sizing; results are persisted verbatim. Stage 2 lets Gemini see only the retrieved sources — a validator rejects any figure whose sourceUrl isn't in the retrieved set, feeding errors back into a bounded retry loop (max 3 attempts).

✓ Skip-don't-fail: if retrieval or synthesis fails, the pipeline continues without market data instead of dying. Ungroundable figures come back null — never invented.

AGENT 4 Feasibility Scout in: Agents 1+2 (+market) · out: FeasibilityData

Pure reasoning, no retrieval. Timeline and capital estimates are explicit ranges with required confidence and reasoning fields — the schema rejects max ≤ min as a false-precision guard.

CRITIQUE Feasibility Audit traceability + over-confidence check

Audits reasoning traceability and over-confident estimates; one regeneration allowed on CRITICAL findings.

AGENT 5 Pitch Builder in: all upstream outputs · out: DeckData.slides

Composes investor-deck slides. A ref menu enumerates every citable upstream fact (e.g. market.tam, paper.keyClaims[2]) as the model's only citation vocabulary — it structurally cannot invent facts.

⚠ Two guardrails: a structural validator rejects invented refs and source-URL mismatches; a semantic critique flags claims whose values don't match upstream findings.

REVIEW Human Review → Export out: PDF · PPTX · DOCX

TTO staff evaluate results before export. All three exporters consume the same normalized RenderDeck model, so the formats cannot disagree on content or grounding — and every export renders visible per-slide Sources citations.

Technical Challenges → Solutions

The hard parts, and how they were solved

Every problem below is a general production-LLM problem — context limits, hallucination, orchestration, long-running tasks — solved with specific, verifiable mechanisms in the Lemma codebase.

Challenge · Context Windows

A full paper doesn't fit every prompt

Re-feeding the entire PDF to all five agents would blow up token costs, latency, and attention quality — later agents would drown in raw text.

Solution · Structured Hand-offs

Only Agent 1 reads the PDF. Every downstream agent consumes compact, schema-validated structured outputs (PaperData, TrlIrlData…) — prompt chaining with typed contracts instead of raw-text relay. The TRL scorer is deliberately forbidden from re-reading the PDF.

Challenge · Hallucination

LLMs invent market figures

Market sizing is exactly where investors check numbers — an invented TAM kills credibility, and a single-prompt approach invents them constantly.

Solution · Closed-World Retrieval

Retrieval and synthesis are split into separate stages. The synthesis model sees only persisted Tavily results; a validator rejects any figure citing a URL outside the retrieved set and feeds the Zod errors back into a bounded retry prompt. Ungroundable figures return null.

Challenge · Long-Running Tasks

Serverless functions time out

A five-agent pipeline with retries runs for minutes — far beyond Vercel's request limits — and a crash at agent 4 must not re-bill agents 1–3.

Solution · Durable Execution (Inngest)

The whole pipeline is one Inngest function with each agent isolated in its own step.run(). Steps are checkpointed: a failed step retries up to 3× without re-running earlier, expensive agents. The API route just fires an event and returns — async by construction.

Challenge · Orchestration

Five agents must stay mutually consistent

Multi-agent pipelines drift: a deck slide can quietly contradict the feasibility estimate it was supposedly built from.

Solution · Ref-Menu Citations + Critique Agents

The pitch builder may only cite facts from an enumerated menu of upstream findings, with source URLs carried through unchanged. Adversarial critique agents audit at three points (paper, feasibility, pitch), each allowed exactly one regeneration to prevent loops.

Challenge · Partial Failure

External dependencies flake

Web search goes down, models return 503s mid-pipeline. Failing the whole run for a missing market section wastes everything already computed.

Solution · Explicit Failure Semantics

Each stage declares its failure mode: core agents retry then fail the project with a notification; market and feasibility skip, don't fail; the deck gracefully omits the market slide when the data is absent. The Gemini client falls back to a same-provider backup model on retryable errors.

Challenge · Schema Drift

LLM output shapes are unreliable

Free-form JSON from a model breaks parsers, and provider-side schema support doesn't cover constraints like ranges, unions, or regex patterns.

Solution · Dual Enforcement (Zod + responseSchema)

One Zod schema per agent is both converted into a Gemini responseSchema (constraining generation) and run as safeParse before persistence (source of truth). Constraints Gemini can't express are still enforced by Zod — e.g. rejecting capital ranges where max ≤ min.

Tech Stack

Every layer, and why it's there

Verified against the public repository — no résumé padding.

Layer	Technology
Framework	Next.js 14 (App Router) · TypeScript · React 18
Database	Neon Postgres via Prisma 7 — one model per agent output, multi-tenant by institution
Background jobs	Inngest — durable execution, step-level checkpointing and retries
LLM	Google Gemini — direct REST, per-agent model + thinking-level config, automatic fallback model
Web search	Tavily, behind a swappable search-client interface
Validation	Zod — every agent output schema-validated before persistence
Auth	Clerk — middleware-gated workspace and onboarding
Storage	Cloudflare R2 (S3 SDK) — PDFs in, deck exports out
Real-time	Pusher (optional at runtime) + status polling fallback
Rate limiting	Upstash Redis — 5 analyses per user per hour
Deck export	puppeteer-core + @sparticuz/chromium (PDF) · pptxgenjs (PPTX) · docx (DOCX)
Testing	Vitest — including fault-injection via a swappable LLM transport

Lemma — How It Works

First-pass research evaluation is slow — and naive LLMs make it worse