LLM Judges Systematically Suppress Minority Human Readings in Legal Essay Evaluation
A controlled study on Thai bar exam essay grading reveals that LLM judges do not neutrally reproduce human inter-rater disagreement — they converge overwhelmingly on the majority human interpretation. When a rubric ambiguity caused a genuine split among expert human examiners (2 vs. 1), 22 of 26 LLMs clustered with…