Over the past few months, the development work on the new Digital Commonwealth repository at the Boston Public Library has focused on functionality for ingesting metadata records via the Open Access Initiative Protocol for Metadata Harvesting (OAI-PMH). This functionality enables Digital Commonwealth to include metadata created by institutions around the state in the central search interface, with links that point back to the original item hosted by the provider. (Digital Commonwealth currently harvests records from ten institutions and consortia, including the State Library of Massachusetts, NOBLE, SAILS, and C/W MARS to name a few.)
BPL development staff have been working closely with each OAI provider to tailor the ingest process to their preferred metadata format (Dublin Core, PBcore, MODS, etc.) as well as the system used by each institution to provide the records (CONTENTdm, Omeka, etc.) The crosswalking process, which converts the incoming metadata records into MODS, also involves a number of data standardization routines, including the transformation of date data into a facet-able and sortable date format based on W3C Date-Time Format, and the conversion of geographic subject/coverage data into hierarchical geographic subjects (state, county, city, etc.) and numeric latitude/longitude coordinates using data from the Getty Thesaurus of Geographic Names. Whenever possible, the ingest process also generates thumbnail images for each object which are then stored in the Digital Commonwealth repository, along with an archival copy of the original metadata record prior to crosswalking.
While all of this involves significant time and effort, the result will be more accurate and more complete metadata records from these providers, and a better search and discovery experience for users as well as better representation of the data within larger shared contexts such as DPLA.
So far the OAI harvesting has been restricted to a test platform. By late February the BPL expects to finish the work on the OAI feeds at which point the feeds will be added to the public repository site (https://search.digitalcommonwealth.org). The focus will then turn to migrating the last few remaining collections from the DSpace repository into the new repository, and integrating the informational content on the current Omeka site into the new design. While no official date has been set for when the new repository will replace the existing systems and be launched as the “official” Digital Commonwealth site, it is anticipated that this milestone will be completed sometime in March.