-


PBCore: The Challenge of Adopting a
Descriptive Metadata Standard
for Public Media
by Nan Rubin

When all full-power television stations in the U.S. turned off their analog transmitters in 2009, the all-digital broadcast chain was complete - from recording at the start of production, through editing, production and distribution, to viewers with smart phones, VOD, and other platforms.

With this digital environment looming, in 2001 the Corporation for Public Broadcasting (CPB) convened the Public Broadcasting Metadata Dictionary Project (PBMD). Comprised of engineers and operations personnel from radio and television stations, their task was to adopt a single set of metadata protocols, to be known as the "Public Broadcasting Core (PBCore) Metadata Dictionary."

Given the constant introduction of new broadcast equipment, they understood that a single metadata standard was necessary to facilitate interoperability by a wide array of devices among such different users as stations, distributors, producers, and vendors of traffic, DAM and related systems. And while not explicit, it was thought that the standard would also become important for archival materials as well.


Figure 1
Internal Broadcast Station Workflows
(Credit: Dave MacCarn, WGBH, 2003)


Because metadata is fundamental to the exchange of digital files, the intent was to identify a core set of descriptors for the digital content used at most public radio and television stations. The Project spent more than two years studying digital operating systems and examining various metadata dictionaries, in particular those used to describe subjects as well as administrative and educational aspects of rich media. The goal was to arrive at the smallest set of elements that could adequately describe and catalog program files.

Dublin Core Element Set emerged as the most appropriate to meet these various concerns, and PBCore was devised as an application profile built on Dublin Core but that also retained elements from other schema and station-based asset management and traffic systems. The end result was a set of metadata standards with a solid foundation that is extensible, scalable, and easy to understand.
PBCore was intended to be "simple" but not "simplistic," because the schema had to be easily understood, implementable and acceptable to the public broadcasting community at large. PBCore contains 58 elements organized in three categories:

  • Content: 20 elements describe intellectual content of a media asset;
  • Intellectual Property: 9 elements relating creators and usage;
  • Instantiation: 29 elements identify the media asset as it exists in the physical world or digitally.

These, in turn, are organized by Content Class, each of which consists of Containers holding individual Elements.


Figure 2
Schema: PBCore v1.1

The PBCore 1.0 was published in spring 2005. It was presented as a single, streamlined standard to which other database structures, including those of PBS, NPR, major producing stations and other asset/content management systems, could be mapped.

PBCore was especially useful for describing digital media, including file URLs for streaming or downloading, and as a syndication format like RSS or ATOM. It was also considered useful for launching an archival or asset management system, because a complete PBCore record contains metadata on the provenance of an object plus the location of media "instantiations" - for example, recording that a media asset is published on the internet in addition to recording the file location and ownership of its source.
In this form, PBCore worked as a basic "starter kit." However, it was recognized that v1.0 would require improvements once it began to be used. Adding extensions to the existing set of metadata elements was planned to accommodate such practical needs.

PBCore.org [http://pbcore.org] was established as the primary reference site. After its release, there was slow but uneven acceptance across public broadcasting, especially as stations increased the content being offered on-line, such as audio and video clips.

At the same time, PBCore was discovered by moving image and media archivists as a useful cataloging format for film and video. From 2005 through 2008, PBCore sessions were presented widely at conferences such as AMIA, the National Educational Television Association, the annual PBS Technology Conference, and similar gatherings, which spurred interest in its use far outside public broadcasting.

PBCore began to be treated as an "access and archival media" metadata standard. PBCore Resources [http://pbcoreresources.org] was created by an active user group as an "unofficial" website to encourage user collaborations. The informal site has become a primary sounding board for users to share experiences testing and implementing PBCore in a variety of environments.

Another group built an online "PBCore database." Powered by Ruby on Rails, Apache Solr, MySQL, nginx, and other third-party modules, the 'PBCore Repository Tool' (as it has become known) is an Open Source application available for free downloading at [http://pbcore.vermicel.li/]. This public repository holds examples of 1200+ entries, and the cataloging tool can generate records in XML or in a standard viewable format.

In 2009, CPB began planning The American Archive, a new initiative to organize and preserve 50 years of public radio and television programs. It was especially notable that the very first project of American Archive was to re-establish support for PBCore and continue its development after several years of being dormant.

A requirement for programs accepted into American Archive will be that records be PBCore compliant. Based on this need, Version 2.0 was released in January 2011, with technical assistance from AudioVisual Preservation Solutions and Digital Dawn. Changes incorporated in v2.0 were based on recommendations collected from a wide range of users after broad outreach and feedback. This new and improved version can be downloaded at: [http://pbcore.org]

Figure 3
Schema: PBCore v1.3


Many stations and producers are starting to look for guidance on metadata, especially to manage their burgeoning sets of digital program files. The evolution of PBCore over the past five years has relied on active participation of an expanding user community, and as PBCore improves, each year new institutions are discovering it, such as Dance Heritage Coalition, Northeast Historic Film, and stations like WILL TV & Radio, WNET-TV and WNYC Radio. These groups reflect a broad array of media producers and collections, and in response to a growing market, new open source and vendor-provided PBCore tools are also becoming available.

However, among public broadcasters, it is still not universally understood that digital media files require standardized metadata to remain useful over time. The question, "What is PBCore for?" continues to be raised, especially if a station already has a functioning media database.

With release of v2.0, the momentum for adopting PBCore will continue to build. We hope it continues within public radio and television, where the investment was made on behalf of a system that needs a descriptive metadata standard now more than ever.

________________________

FOOTNOTES

Dublin Core Metadata Element Set, Version 1.1: Reference Description.
October 31, 2010, from http://www.dublincore.org/documents/dces/

AudioVisual Preservation Solutions, Retrieved November 11, 2010
http://www.avpreserve.com/

 


Back to Top | Home Page

Print this article
Email the Editor
Front Page

About the Author

Nan Rubin has more than thirty years experience managing technology and facilities projects in public radio and television, including 15 years at New York public television station WNET-TV/Thirteen working in technology planning. From 2003 - 2010, Ms. Rubin was Project Director of Preserving Digital Public Television, funded by the Library of Congress's National Digital information and Infrastructure Preservation Program (NDIIPP), where she oversaw a team of 20 based at WNET, WGBH in Boston, the Public Broadcasting Service, and New York University designing a model preservation repository for born-digital public television productions. The final project reports, published in 2010, have been widely circulated in the field and are available at www.thirteen.org/ptvdigitalarchive.. She speaks frequently on digital preservation and media archives, and recent articles have been published in International Preservation News and Library Trends Journal. She is currently doing outreach for the PBCore 2.0 Project.

Special thanks to Marcia Brooks, Project Director of the PBCore 2.0 Project, Dave MacCarn, Chief Technologist at WGBH, and Jack Brighton, Director of New Media & Innovation, Illinois Public Media, University of Illinois Urbana-Champaign, for their assistance and advice in preparing this article.


 

 


__________________________________________________________________________________________________________

The Tech Review . April, 2011. ©2011. Association of Moving Image Archivists.