absorb.md

Racer Retrievalaugmented Contextual Rapid Speculative Decoding

1 mentions across 0 people

Unknown speaker
paper · 2026-04-17
Recommended

We propose RACER (Retrieval-Augmented Contextual Rapid Speculative Decoding), a lightweight and training-free method that integrates retrieved exact patterns with logit-driven future cues. This unification supplies both reliable anchors and flexible extrapolation, yielding richer speculative drafts. Experiments on Spec-Benchmark, HumanEval, and MGSM-ZH demonstrate that RACER consistently accelerates inference, achieving more than 2x speedup over autoregressive decoding, and outperforms prior tra

RACER: Accelerating LLM Inference with Hybrid Speculative Decoding