Video Content Summarization
Summarizing archival video content with large language-vision models
Overview
This project explores the use of large language-vision models (LVMs) for automatically generating natural language summaries of archival video content. As video archives grow, there is an increasing need for tools that can produce meaningful descriptions of video content to support search, discovery, and cataloging workflows.
Approach
We investigate how LVMs can be applied to video content summarization tasks, evaluating their ability to produce accurate, informative descriptions of video segments from archival collections. The work examines different prompting strategies and model capabilities for handling the unique characteristics of archival video, including varying quality, historical content, and domain-specific terminology.