Example use case: grounded CV scoring

Using deterministic quoting for significantly improving model performance and trustworthiness.

When using AI, we’re looking to maximize trustworthiness while maintaining good performance. In this example we will demonstrate how to use deterministic quoting to ground a copiloting process.

For tasks such as document evaluation or summarization, quoting input documents:

Reduces model hallucinations by not allowing misquoting input documents.
Improves performance by clearly grounding the model into doing the task at hand.
Increases output trustworthiness via references to input documents.

However, quoting is not trivial for LLMs.

What is deterministic quoting?

In the context of AI, sometimes a model can fail to produce a valid quote from a set of documents. There are several ways to improve performance in this regard, be it either through prompt engineering or model selection. However, these specific approaches still won’t bring the success rate to 100%.

With the input-dependent type restrictions that our API introduces, we can guarantee quoting that never fails. We call this deterministic quoting. Implemented using an algorithmic approach on the LLM’s outputs during inference, for a specific set of input documents, a deterministic quote is a direct and valid quote from those documents.

Use case: grounded CV scoring

In this use case, we're looking to have an AI take a look at an input CV document and give a score for the CV given a recruiter's requirements. We would like to understand what this score is based on so we would like some quotations from the CV to ground the score. Then we'll restrict the score to the range 1 to 10.

In this current example we will demonstrate how to create a TypeScript function that given a CV and requirement description, outputs the desired TS object consisting of a score and deterministic quotes. In October this year we will release the COGA TS library which will make this even easier.

Creating the output schema.

Starting off, our schema consists of some quotes from the CV, to constrain the LLM we'll limit this list between 1 and 4 items. We also need to provide which document we would like the quote to be taken from, in our case there is just one and we'll call it 'CV'.

For the score, we can use the classic JSON Schema approach and define an integer type with a restricted inclusive minimum and maximum.

This gives our final schema:

const schema = {
  quotes: {
    type: "array",
    items: {
      type: "quote",
      description: "A quote from the CV that supports the score that you gave",
      source_docs: ["CV"]
    },
    minItems: 1,
    maxItems: 4
  },
  score: {
    type: "integer",
    description: "The score from 1 to 10",
    minimum: 1,
    maximum: 10
  }
};

Creating the LLM prompt.

The prompt for the LLM is going to be important in guiding the AI in completing its task, in this case we will give some context that we're working in recruitment and outline the structure we would like. We don't need to be specific as the schema we have defined will guide the LLM correctly. Finally we can leave the requirement as a variable `requirementDesc` that will be provided by the recruiter.

const prompt = `You are a CV reviewer to help a recruiter find candidates that match some requirements.
  You can only provide responses in a list of quotes with strength of the match of any and all references that support the claim.
  where the quotes are exact extracts from the CV that support or refute the claim, and the score is a rating from 1 to 10 of how well each extract from the CV supports the claim.
  Extracts should be useful to recruiters to assess if the candidate's CV supports the experience in the Claim.
  
  Claim: ${requirementDesc}`;

Bringing it together.

The COGA API `v1/generate` endpoint expects a prompt schema and documents as it's inputs. All that's left for us to do is to populate the recruiter's CV that they're working on and the requirements they would like to search for. Here is what the final code for a `getCVScore` might look like:

export async function getCVScore(
    CVText: string,
    requirementDesc: string
  ) {
    // JSON Schema for the response
    const schema = {
      quotes: {
        type: "array",
        items: {
          type: "quote",
          description: "A quote from the CV that supports the score that you gave",
          source_docs: ["CV"]
        },
        minItems: 1,
        maxItems: 4
      },
      score: {
        type: "integer",
        description: "The score from 1 to 10",
        minimum: 1,
        maximum: 10
      }
    };;
  
    const documents: { id: string, text: string }[] = [
      { id: "CV", text: CVText },
    ];
  
    const prompt = `You are a CV reviewer to help a recruiter find candidates that match some requirements.
    You can only provide responses in a list of quotes with strength of the match of any and all references that support the claim.
    where the quotes are exact extracts from the CV that support or refute the claim, and the score is a rating from 1 to 10 of how well each extract from the CV supports the claim.
    Extracts should be useful to recruiters to assess if the candidate's CV supports the experience in the Claim.
    
    Claim: ${requirementDesc}`;

    const response = await fetch("https://coga.ai/api/v1/generate", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${COGA_API_KEY}`,
      },
      body: JSON.stringify({
        schema: schema,
        documents: documents,
        prompt: prompt,
      }),
    });
  
    const data = await response.json();
    // This data will be in the format of:
    // { quotes: [{ text: string; id: string, quoteStart: number, quoteEnd: number }], score: number }
    return data;
  }

Final results

We went ahead and built an example mini-app to demonstrate our API in action for scoring a CV for different attributes. Log in or create an account to play around with the scored attributes and see the score and grounding results produced live!

Disclaimer: This Demo is stable running on an 8 Billion parameter model - GPT4 has 175 Billion for comparison