CLAMS Platform
Computational Linguistics Applications for Multimedia Services
Overview
CLAMS is an open-source platform for applying computational linguistics and multimedia analysis tools to audiovisual materials in cultural heritage archives. It provides an end-to-end framework for orchestrating NLP, computer vision, and ASR pipelines over video and audio content, producing rich, structured metadata in the Multi-Media Interchange Format (MMIF).
The platform is developed in collaboration with the American Archive of Public Broadcasting at GBH, enabling archivists and librarians to process large collections of public media that would be infeasible to catalog manually.
CLAMS Agent (Thesis Work)
My thesis research focuses on designing an LLM/VLM-powered agentic system that orchestrates the CLAMS platform's multimedia processing capabilities. The agent interprets user requests, selects and sequences appropriate analysis tools, and coordinates the extraction of metadata from archival video — combining language models with the platform's existing NLP, computer vision, and speech recognition applications.
SDK & Application Development
I integrate LLM interfaces into the CLAMS SDK and build applications for multimodal analysis. The SDK enables developers to create interoperable NLP and multimedia processing applications that communicate through MMIF, supporting tasks such as named entity recognition, scene detection, OCR, and automatic speech recognition.
Technical Stack
Publications
Multimodal Interoperability with the CLAMS Platform
MultiMedia Modeling (MMM) 2025 — Demo