In order to better support the ever-growing collections of digitized content from Digital Commonwealth member institutions, developers in the Boston Public Library’s Digital Services team have been building the next generation of the library’s digital asset management system. This new system, built entirely on open-source software, uses cloud storage for file management, allowing the repository to potentially grow exponentially, without the constraints of locally-managed servers and storage devices.

This new system is a suite of applications, APIs, and services that are collectively known as “DC3,” since this is the third version of the asset management system used to support preservation and dissemination of digitized primary source materials. (Click here for an overview of the previous version.)

The heart of the new system is an application called Curator, which is responsible for managing all of the descriptive, administrative, and technical metadata for objects and files in the repository. Curator provides an application programming interface (API) to support ingesting new items into the repository or making changes to existing items. Backed by a relational database, the Curator data model supports a wide variety of content types, as well as rich descriptive metadata for ingested items conforming to the Digital Commonwealth metadata application profile, which is based on the Metadata Object Description Schema (MODS) created by the Library of Congress. This system provides improved data validation and authority control, making better use of controlled vocabularies and thesauri offered by the Library of Congress and the Getty Research Institute.

Curator interacts with a number of other applications in the DC3 ecosystem, including:

  • ARK Manager – manages unique Archival Resource Key identifiers and permalinks for repository items.
  • AVI Processor – analyzes ingested files to extract technical metadata and creates derivative files used for viewing and downloading.
  • BPLDC Authority API – supports querying a variety of controlled data sources (such as LC, Getty, and GeoNames) for descriptive metadata fields including subjects, locations, genres, languages, resource types, names, etc.
  • Canataloupe – provides high-resolution images and deep zooming functionality for the DC user interface via the IIIF Image API.
  • Solr – supports indexing and retrieval of metadata and full-text content; powers the search features for the DC user interface.

In addition to the increased capacity (and decreased maintenance) provided by moving storage infrastructure to the cloud, this system provides a number of advantages. The relational data model used by Curator supports the ability to make updates to existing metadata much more efficiently. By spreading functionality over a variety of applications, the system is more fault-tolerant overall, and components can be re-engineered without the need for a complete overhaul of the entire system. And because this system uses more widely-adopted technologies and components, it will be easier to maintain and on-board new developers in the future.

All components of the DC3 system are built on freely-available open-source software. ARK-manager, AVI Processor, and BPLDC Authority API are custom-built applications created and maintained by BPL Digital Services – like Curator, code for many of these projects is available on GitHub.

Please contact us with any questions, comments, or concerns.

Developing a Born-Digital Preservation Workflow

Presenters: Bill Donovan and Jack Kearney, Boston College

Bell Tower image
Postcard image of the Boston College Bell Tower, ca. 1930-1945. From the Tichnor Brothers Postcard Collection at the Boston Public Library.

Our presenters described the workflow followed to access records on an external hard drive included in the personal papers of Irish soprano and harpist Mary O’Hara, their first dive into the sea of digital preservation. They described how workflows start as baseline best practices. What happens when the unanticipated occurs? Hearing about the steps taken at Boston College to appraise, ingest and clear unanticipated hurdles along the way reinforced that processing plans/workflows are a starting point. What you find when you open the files can and will drive changes to workflows – sound familiar? Tags: Writeblocker, UNIX, 8.3 Constraint, Fixity (software), Identity Finder (software), XENA tool, Policy writing, FITS tool, JHOVE tool, LOCKSS, DP in a box, Digital Forensics.

Digital Commonwealth 2.0: It’s Alive!

Presenters: Steven Anderson and Eben English

Despite the migration to our new platform in Fedora and Hydra literally happening while we met, our intrepid presenters gave before & after comparisons of the repository website with its streamlined visual presentation and enhanced search capabilities. If you haven’t already, check it out!

Rapid Fire Inspiring Projects

Benjamin Sewall Blake jumping, ca. 1888. From the Francis Blake photographs at the Massachusetts Historical Society.
Benjamin Sewall Blake jumping, ca. 1888. From the Francis Blake photographs at the Massachusetts Historical Society.

Presenters: Christine Clayton, Worcester Art Museum (WAM); Abigail Cramer, Historic New England (HNE); Sean M. Fisher, Department of Conservation (DCR) and Recreation and Rebecca Kenney, Massachusetts Water Resources Authority (MWRA); Larissa Glasser, Arnold Arboretum Horticultural Library (AAHL); Nancy Heywood, Massachusetts Historical Society (MHS); Michael Lapides, New Bedford Whaling Museum; Sara Slymon, Turner Free Library

WOW! Our presenters offered up a smorgasbord of formats, collections and projects they undertook to make records available to their users. For some, their users were internal, like the WAM, which digitized exhibition catalogs, HNE digitized their collection of photographs by Nathaniel Stebbins, DCR and MWRA digitized 8800 images, the largest collection undertaken by Digital Commonwealth so far. AAHL digitized a collection of glass plate negatives…the results? Unanticipated revenue streams – from interior decorators, increased hits on websites, object provenance authentications, access to the identities of early American movers and shakers as reported in contemporary newspapers, accessible Town Reports and High School yearbooks. Several of these projects are still in the pipelines, so not yet searchable on the Digital Commonwealth website.

Submitted by guest reporter Elizabeth Cousins, First Parish in Brookline.

Over the past few months, the development work on the new Digital Commonwealth repository at the Boston Public Library has focused on functionality for ingesting metadata records via the Open Access Initiative Protocol for Metadata Harvesting (OAI-PMH). This functionality enables Digital Commonwealth to include metadata created by institutions around the state in the central search interface, with links that point back to the original item hosted by the provider. (Digital Commonwealth currently harvests records from ten institutions and consortia, including the State Library of Massachusetts, NOBLE, SAILS, and C/W MARS to name a few.)

BPL development staff have been working closely with each OAI provider to tailor the ingest process to their preferred metadata format (Dublin Core, PBcore, MODS, etc.) as well as the system used by each institution to provide the records (CONTENTdm, Omeka, etc.) The crosswalking process, which converts the incoming metadata records into MODS, also involves a number of data standardization routines, including the transformation of date data into a facet-able and sortable date format based on W3C Date-Time Format, and the conversion of geographic subject/coverage data into hierarchical geographic subjects (state, county, city, etc.) and numeric latitude/longitude coordinates using data from the Getty Thesaurus of Geographic Names. Whenever possible, the ingest process also generates thumbnail images for each object which are then stored in the Digital Commonwealth repository, along with an archival copy of the original metadata record prior to crosswalking.

While all of this involves significant time and effort, the result will be more accurate and more complete metadata records from these providers, and a better search and discovery experience for users as well as better representation of the data within larger shared contexts such as DPLA.

So far the OAI harvesting has been restricted to a test platform. By late February the BPL expects to finish the work on the OAI feeds at which point the feeds will be added to the public repository site (https://search.digitalcommonwealth.org). The focus will then turn to migrating the last few remaining collections from the DSpace repository into the new repository, and integrating the informational content on the current Omeka site into the new design. While no official date has been set for when the new repository will replace the existing systems and be launched as the “official” Digital Commonwealth site, it is anticipated that this milestone will be completed sometime in March.

Over the last month or so, the development of the new Digital Commonwealth repository currently ongoing at BPL has focused on refining the batch upload process. The repository developers have been working closely with the BPL Digital Services metadata team to create a standardized spreadsheet format for ingest that will offer institutions the ability to provide rich metadata about their digital objects, while also being flexible, intuitive, and simple to use. This work has brought the goal of allowing institutions to do self-mediated batch uploads much closer, though there are still several issues to tackle before this functionality is ready to roll out.

Meanwhile, the beta testing phase of both the “Search” and “Admin” applications is ongoing and has received quite a bit of helpful feedback from a number of institutions/individuals that have taken the system for a test-drive.

The URLs are:
Search (public discovery): http://search.digitalcommonwealth.org/
Admin (ingest & management): http://admin.digitalcommonwealth.org/

In late September, development of the workflow for ingesting material into the repository via OAI-PMH will begin in order to aggregate records from the numerous institutions around the state that provide access to digital objects through their own repository systems. The BPL will be reaching out to institutions that currently contribute material to Digital Commonwealth via OAI-PMH feed to learn a more about existing data structures, preferred metadata formats for harvesting, back-end systems being used, and other details that will help this phase of the project move forward more smoothly.

Lastly, the BPL has set up a public Google Group email list for institutions and users to provide feedback or report issues with the new repository system. Anyone may read content posted to the group; membership is required to send messages to the list. See https://groups.google.com/forum/#!forum/digitalcommonwealth for details.