📄 paper

Fastdvlm

1 mentions across 0 people

All mentions

Unknown speaker

paper · 2026-04-09

Recommended

“We present Fast-dVLM, a block-diffusion-based VLM that enables KV-cache-compatible parallel decoding and speculative block decoding for inference acceleration. ... Extensive experiments across 11 multimodal benchmarks show Fast-dVLM matches its autoregressive counterpart in generation quality. With SGLang integration and FP8 quantization, Fast-dVLM achieves over 6x end-to-end inference speedup over the AR baseline.”

Fast-dVLM: Block-Diffusion for Efficient VLM Inference ↗