Analytics / LLM as a judge

LLM as a judge

The LLM as a judge tab shows automated quality scores generated by the evaluation LLM (configured in the domain's Evaluation LLM settings). It mirrors the structure of the Human feedback tab:

Summary scores — total evaluations and average scores across the same five dimensions (Overall, Relevance, Completeness, Faithfulness, Coherence).
Lowest quality topics — documents with the worst AI-judged answer quality.
Highest quality topics — documents with the best AI-judged answer quality.

Quality scores appear automatically once answers have been evaluated. Start a chat and feedback data is collected in the background. Click any topic to navigate to the document for review.

FAQ

What does the LLM as a judge tab show?

The LLM as a judge tab shows automated quality scores generated by the evaluation LLM configured in the domain’s Evaluation LLM settings. It mirrors the Human feedback tab and includes summary scores plus lists of the lowest and highest quality topics.

What are the Summary scores in the LLM as a judge tab?

Summary scores show the total number of evaluations and the average scores across five dimensions. The dimensions are Overall, Relevance, Completeness, Faithfulness, and Coherence.

What are the Lowest quality topics and Highest quality topics lists?

Lowest quality topics lists documents with the worst AI-judged answer quality. Highest quality topics lists documents with the best AI-judged answer quality.

When do quality scores appear in the LLM as a judge tab?

Quality scores appear automatically once answers have been evaluated. Starting a chat collects feedback data in the background, which enables the scores to show up.

How do I review a document from the LLM as a judge tab?

Click any topic in the tab to navigate to the corresponding document. You can then review the document based on the AI-judged quality results.

LLM as a judge

Section▾

FAQ