Instvl
1 mentions across 0 people
All mentions
Unknown speaker
Recommendedpaper · 2026-04-10
“To support this, we present InstVL, a large-scale dataset (2 million images, 50,000 videos) with dual-granularity annotations: holistic scene captions and dense, grounded instance descriptions.”
Enhancing VLP Spatial-Temporal Reasoning via Instance-Aware Pre-training ↗