Slates-500 Dataset
A multimodal benchmark for information extraction from archival video slates
Overview
Slates-500 is a multimodal dataset and evaluation pipeline designed for benchmarking information extraction from archival video slates. Video slates are title cards that appear at the beginning of archival footage, containing structured metadata such as program titles, dates, producers, and other production information.
Extracting this information requires models to jointly reason over visual layout, text recognition, and semantic understanding — making it a challenging testbed for vision-language models and multimodal information extraction systems.
Dataset
The dataset contains 500 annotated video slates from archival collections, with ground-truth labels for structured fields. It includes an evaluation pipeline for measuring model performance on field-level extraction accuracy, providing a standardized benchmark for comparing different approaches to this task.
Technical Stack
Publication
Slates-500: A Multimodal Dataset for Information Extraction from Archival Video
Under review