Forgotten Polygons Multimodal Large Language Models Are Shapeblind
2 mentions across 1 person
Visit ↗All mentions
“Despite strong performance on vision-language tasks, Multimodal Large Language Models (MLLMs) struggle with mathematical problem-solving, with both open-source and state-of-the-art models falling short of human performance on visual-math benchmarks.”
MLLMs Exhibit Fundamental Deficiencies in Visual-Mathematical Reasoning, Reliant ↗