Mmmtbench
1 mentions across 1 person
Visit ↗All mentions
“MM-MT-Bench is a multi-turn LLM-as-a-judge evaluation task released by Mistral AI that uses GPT-4o for judging model answers given reference answers.”
Mistral Evals: A Toolkit for LLM Benchmark Evaluation ↗