Structured Inference for LLMs
Reliable outputs via blazingly fast Dynamic Grammar Constrained Decoding
Guaranteed output format
Our input-dependent schema language enables developers to construct fully reliable interfaces to LLMs, even with extremely lengthy, multi-faceted responses.
Reduce model costs
Complex queries often require larger models. Using structured inference allows using smaller models without negative eval impact.
Ultra low latency (50us)
Basic structured output processors add latency to each generated token. Our library is highly optimized to bring this cost as close to 0 as possible.
Skip deterministic inference
Auto-fill deterministic token sequences via schema look-ahead, further cutting down on costs. Save up to 50% on inference time with common output schemas.
Receive LLM outputs which adhere to input JSON Schema.
Fix values to ranged integers, literal strings, etc.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
{ "toneDescription": { "type": "string" }, "toneHighlights": { "type": "array", "items": { "type": "quote", "source_docs": ["ab153f"] } }, "negativityStrength": { "type": "integer", "minimum": 1, "maximum": 10 } } |
Our flavor of JSON Schema allows for active types – input-dependent types which adhere to input documents.
These types include:
Deterministic quoting
Inline text replacement
Deterministic quoting.
Hallucination-free references from input documents. Allowing you to ground the LLM in its inputs. This is an excellent solution for verifiably accurate model outputs.
Inline text replacement.
Easily give write access to a user-selected part of a document. Syntactically correct replacements are unsolved problem in NLP. Our Active Replacements type enables heuristic application during runtime to correct common LLM mistakes.
Fetch structured copilot responses via COGA API...
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 45 |
const resp = await cogaClient.call( prompt: "Edit the user-provided email.", schema: replacement: { type: "replacement", documentId: "email", editStart: 50, editEnd: 121, }), documents: [ { id: "email", text: "Hi David,\n Thank you for your email. Yes, ...", }, { id: "cv", text: "David is a software engineer with 5 years...", }, ] ); |
...and reduce Time to Launch for your AI feature.
We can run on all models.
Enforcing structure allows for improved model quality, on par or beyond closed models such as GPT-4o.
Build once and forget.
Minimize time to feature completion and leave model management to us.