Slates-500 Dataset

A multimodal benchmark for information extraction from archival video slates

Role: Lead Researcher
Status: Paper under review
Affiliation: Brandeis University

Overview

Slates-500 is a multimodal dataset and evaluation pipeline designed for benchmarking information extraction from archival video slates. Video slates are title cards that appear at the beginning of archival footage, containing structured metadata such as program titles, dates, producers, and other production information.

Extracting this information requires models to jointly reason over visual layout, text recognition, and semantic understanding — making it a challenging testbed for vision-language models and multimodal information extraction systems.

Dataset

The dataset contains 500 annotated video slates from archival collections, with ground-truth labels for structured fields. It includes an evaluation pipeline for measuring model performance on field-level extraction accuracy, providing a standardized benchmark for comparing different approaches to this task.

Technical Stack

Vision-Language Models Information Extraction OCR Python Evaluation Pipelines Dataset Design

Publication

Slates-500: A Multimodal Dataset for Information Extraction from Archival Video

Under review