Fix llm as a judge cookbook images (#1517)

This commit is contained in:
Ankur Goyal 2024-10-24 23:14:32 -07:00 committed by GitHub
parent fe2bbf14cb
commit e19a3725a9
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 3 additions and 3 deletions

View File

@ -499,7 +499,7 @@
"It looks like the numeric rater scored almost 94% in total. That's not bad, but if 6% of your evals are incorrectly judged, that could make it very hard to trust them. Let's dig into the Braintrust\n", "It looks like the numeric rater scored almost 94% in total. That's not bad, but if 6% of your evals are incorrectly judged, that could make it very hard to trust them. Let's dig into the Braintrust\n",
"UI to get some insight into what's going on.\n", "UI to get some insight into what's going on.\n",
"\n", "\n",
"![Partial credit](../images/Custom-LLM-as-a-Judge/Partial-Credit.gif)\n", "![Partial credit](../images/Custom-LLM-as-a-Judge-Partial-Credit.gif)\n",
"\n", "\n",
"It looks like a number of the incorrect answers were scored with numbers between 1 and 10. However, we do not currently have any insight into why the model gave these scores. Let's see if we can\n", "It looks like a number of the incorrect answers were scored with numbers between 1 and 10. However, we do not currently have any insight into why the model gave these scores. Let's see if we can\n",
"fix that next.\n" "fix that next.\n"
@ -670,11 +670,11 @@
"It doesn't look like adding reasoning helped the score (in fact, it's half a percent worse). However, if we look at one of the failures, we'll get some insight into\n", "It doesn't look like adding reasoning helped the score (in fact, it's half a percent worse). However, if we look at one of the failures, we'll get some insight into\n",
"what the model was thinking. Here is an example of a hallucinated answer:\n", "what the model was thinking. Here is an example of a hallucinated answer:\n",
"\n", "\n",
"![Output](../images/Custom-LLM-as-a-Judge/Output.png)\n", "![Output](../images/Custom-LLM-as-a-Judge-Output.png)\n",
"\n", "\n",
"And the score along with its reasoning:\n", "And the score along with its reasoning:\n",
"\n", "\n",
"![Reasoning](../images/Custom-LLM-as-a-Judge/Reasoning.png)\n" "![Reasoning](../images/Custom-LLM-as-a-Judge-Reasoning.png)\n"
] ]
}, },
{ {

View File

Before

Width:  |  Height:  |  Size: 207 KiB

After

Width:  |  Height:  |  Size: 207 KiB

View File

Before

Width:  |  Height:  |  Size: 1.1 MiB

After

Width:  |  Height:  |  Size: 1.1 MiB

View File

Before

Width:  |  Height:  |  Size: 300 KiB

After

Width:  |  Height:  |  Size: 300 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 191 KiB