Structured Inference for LLMs

Reliable outputs via blazingly fast Dynamic Grammar Constrained Decoding

Guaranteed output format

Our input-dependent schema language enables developers to construct fully reliable interfaces to LLMs, even with extremely lengthy, multi-faceted responses.

Reduce model costs

Complex queries often require larger models. Using structured inference allows using smaller models without negative eval impact.

Ultra low latency (50us)

Basic structured output processors add latency to each generated token. Our library is highly optimized to bring this cost as close to 0 as possible.

Skip deterministic inference

Auto-fill deterministic token sequences via schema look-ahead, further cutting down on costs. Save up to 50% on inference time with common output schemas.

What is Structured Inference?
Structured Outputs

Receive LLM outputs which adhere to input JSON Schema.

Fix values to ranged integers, literal strings, etc.

 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
{ 
    "toneDescription": {
        "type": "string"
    },
    "toneHighlights": {
        "type": "array",
        "items": {
            "type": "quote",
            "source_docs": ["ab153f"]
        }
    },
    "negativityStrength": {
        "type": "integer",
        "minimum": 1,
        "maximum": 10
    }
}
Active Types

Our flavor of JSON Schema allows for active types – input-dependent types which adhere to input documents.

These types include:

Deterministic quoting

Inline text replacement

Deterministic quoting.

Hallucination-free references from input documents. Allowing you to ground the LLM in its inputs. This is an excellent solution for verifiably accurate model outputs.

Inline text replacement.

Easily give write access to a user-selected part of a document. Syntactically correct replacements are unsolved problem in NLP. Our Active Replacements type enables heuristic application during runtime to correct common LLM mistakes.

Empowering next-gen copilot solutions

Fetch structured copilot responses via COGA API...

 
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 45


const resp = await cogaClient.call(
  prompt: "Edit the user-provided email.",
  schema: replacement: {
    type: "replacement",
    documentId: "email",
    editStart: 50,
    editEnd: 121,
  }),
  documents: [
    {
      id: "email",
      text: "Hi David,\n Thank you for your email. Yes, ...",
    },
    {
      id: "cv",
      text: "David is a software engineer with 5 years...",
    },
  ]
);

...and reduce Time to Launch for your AI feature.

Example App
SOTA performance with less dev and maintenance time

We can run on all models.

Enforcing structure allows for improved model quality, on par or beyond closed models such as GPT-4o.

Build once and forget.

Minimize time to feature completion and leave model management to us.

How to get started

We are opening our public API beta in November 2024 and TypeScript and Python libraries in January 2025.


If you are looking to see our API in action, see our example use cases. If you would like a demo or gain alpha access, please contact sales.