Mmlupro
2 mentions across 1 person
Visit ↗All mentions
“We evaluate model performance on GPQA (Rein et al. 2024) and MMLU-Pro (Wang et al. 2024)”
Prompting LLMs with Threats or Tips Shows Limited Efficacy ↗“We study both domain-specific expert personas and low-knowledge personas, evaluating six models on GPQA Diamond (Rein et al. 2024) and MMLU-Pro (Wang et al. 2024), graduate-level questions spanning science, engineering, and law.”
Persona Prompting Fails to Improve LLM Factual Accuracy ↗