Fix llm as a judge cookbook images (#1517)

This commit is contained in:
Ankur Goyal 2024-10-24 23:14:32 -07:00 committed by GitHub
parent fe2bbf14cb
commit e19a3725a9
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 3 additions and 3 deletions

View File

@ -499,7 +499,7 @@
"It looks like the numeric rater scored almost 94% in total. That's not bad, but if 6% of your evals are incorrectly judged, that could make it very hard to trust them. Let's dig into the Braintrust\n",
"UI to get some insight into what's going on.\n",
"\n",
"![Partial credit](../images/Custom-LLM-as-a-Judge/Partial-Credit.gif)\n",
"![Partial credit](../images/Custom-LLM-as-a-Judge-Partial-Credit.gif)\n",
"\n",
"It looks like a number of the incorrect answers were scored with numbers between 1 and 10. However, we do not currently have any insight into why the model gave these scores. Let's see if we can\n",
"fix that next.\n"
@ -670,11 +670,11 @@
"It doesn't look like adding reasoning helped the score (in fact, it's half a percent worse). However, if we look at one of the failures, we'll get some insight into\n",
"what the model was thinking. Here is an example of a hallucinated answer:\n",
"\n",
"![Output](../images/Custom-LLM-as-a-Judge/Output.png)\n",
"![Output](../images/Custom-LLM-as-a-Judge-Output.png)\n",
"\n",
"And the score along with its reasoning:\n",
"\n",
"![Reasoning](../images/Custom-LLM-as-a-Judge/Reasoning.png)\n"
"![Reasoning](../images/Custom-LLM-as-a-Judge-Reasoning.png)\n"
]
},
{

View File

Before

Width:  |  Height:  |  Size: 207 KiB

After

Width:  |  Height:  |  Size: 207 KiB

View File

Before

Width:  |  Height:  |  Size: 1.1 MiB

After

Width:  |  Height:  |  Size: 1.1 MiB

View File

Before

Width:  |  Height:  |  Size: 300 KiB

After

Width:  |  Height:  |  Size: 300 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 191 KiB