💻 repo

Instvl

1 mentions across 0 people

All mentions

Unknown speaker

paper · 2026-04-10

Recommended

“To support this, we present InstVL, a large-scale dataset (2 million images, 50,000 videos) with dual-granularity annotations: holistic scene captions and dense, grounded instance descriptions.”

Enhancing VLP Spatial-Temporal Reasoning via Instance-Aware Pre-training ↗