Instap
1 mentions across 0 people
All mentions
Unknown speaker
Recommendedpaper · 2026-04-10
“We introduce InstAP, an Instance-Aware Pre-training framework that jointly optimizes global vision-text alignment and fine-grained, instance-level contrastive alignment by grounding textual mentions to specific spatial-temporal regions.”
Enhancing VLP Spatial-Temporal Reasoning via Instance-Aware Pre-training ↗