Racer Retrievalaugmented Contextual Rapid Speculative Decoding
1 mentions across 0 people
All mentions
Unknown speaker
Recommendedpaper · 2026-04-17
“We propose RACER (Retrieval-Augmented Contextual Rapid Speculative Decoding), a lightweight and training-free method that integrates retrieved exact patterns with logit-driven future cues. This unification supplies both reliable anchors and flexible extrapolation, yielding richer speculative drafts. Experiments on Spec-Benchmark, HumanEval, and MGSM-ZH demonstrate that RACER consistently accelerates inference, achieving more than 2x speedup over autoregressive decoding, and outperforms prior tra”
RACER: Accelerating LLM Inference with Hybrid Speculative Decoding ↗