Video Content Summarization

Summarizing archival video content with large language-vision models

Role: Lead Author
Published: IEEE Big Data 2024
Affiliation: Brandeis University

Overview

This project explores the use of large language-vision models (LVMs) for automatically generating natural language summaries of archival video content. As video archives grow, there is an increasing need for tools that can produce meaningful descriptions of video content to support search, discovery, and cataloging workflows.

Approach

We investigate how LVMs can be applied to video content summarization tasks, evaluating their ability to produce accurate, informative descriptions of video segments from archival collections. The work examines different prompting strategies and model capabilities for handling the unique characteristics of archival video, including varying quality, historical content, and domain-specific terminology.

Technical Stack

Large Vision-Language Models Video Understanding Summarization Python PyTorch Prompt Engineering

Publications & Presentations

2024

Video Content Summarization with Large Language-Vision Models

Lynch, Jiang, Lambright, Rim, Pustejovsky

IEEE Big Data (CAS Workshop), 2024

2025

Video Content Summarization with Large Language-Vision Models

Boston Digital Humanities Symposium — Presentation