In order to better support the ever-growing collections of digitized content from Digital Commonwealth member institutions, developers in the Boston Public Library’s Digital Services team have been building the next generation of the library’s digital asset management system. This new system, built entirely on open-source software, uses cloud storage for file management, allowing the repository to potentially grow exponentially, without the constraints of locally-managed servers and storage devices.

This new system is a suite of applications, APIs, and services that are collectively known as “DC3,” since this is the third version of the asset management system used to support preservation and dissemination of digitized primary source materials. (Click here for an overview of the previous version.)

The heart of the new system is an application called Curator, which is responsible for managing all of the descriptive, administrative, and technical metadata for objects and files in the repository. Curator provides an application programming interface (API) to support ingesting new items into the repository or making changes to existing items. Backed by a relational database, the Curator data model supports a wide variety of content types, as well as rich descriptive metadata for ingested items conforming to the Digital Commonwealth metadata application profile, which is based on the Metadata Object Description Schema (MODS) created by the Library of Congress. This system provides improved data validation and authority control, making better use of controlled vocabularies and thesauri offered by the Library of Congress and the Getty Research Institute.

Curator interacts with a number of other applications in the DC3 ecosystem, including:

  • ARK Manager – manages unique Archival Resource Key identifiers and permalinks for repository items.
  • AVI Processor – analyzes ingested files to extract technical metadata and creates derivative files used for viewing and downloading.
  • BPLDC Authority API – supports querying a variety of controlled data sources (such as LC, Getty, and GeoNames) for descriptive metadata fields including subjects, locations, genres, languages, resource types, names, etc.
  • Canataloupe – provides high-resolution images and deep zooming functionality for the DC user interface via the IIIF Image API.
  • Solr – supports indexing and retrieval of metadata and full-text content; powers the search features for the DC user interface.

In addition to the increased capacity (and decreased maintenance) provided by moving storage infrastructure to the cloud, this system provides a number of advantages. The relational data model used by Curator supports the ability to make updates to existing metadata much more efficiently. By spreading functionality over a variety of applications, the system is more fault-tolerant overall, and components can be re-engineered without the need for a complete overhaul of the entire system. And because this system uses more widely-adopted technologies and components, it will be easier to maintain and on-board new developers in the future.

All components of the DC3 system are built on freely-available open-source software. ARK-manager, AVI Processor, and BPLDC Authority API are custom-built applications created and maintained by BPL Digital Services – like Curator, code for many of these projects is available on GitHub.

Please contact us with any questions, comments, or concerns.

Smith College Class of 1902 Basketball Team (C.1902), Wikimedia Commons.

More than a century ago, the first women’s collegiate basketball championship was played in Massachusetts between Smith College sophomores and freshman. “Smith March Madness 1892” is a 8:20 minute video about the game. Senda Berenson, known as the “Mother of Women’s Basketball” and Director of Physical Training at Smith, introduced the game of basketball, developed by James Naismith the year before, to her Smith students. “Major newspapers and magazines in the Northeast covered the championship game, and reporters equated the popularity of the event to the Harvard Yale men’s football game.”

Senda Berenson wrote an article entitled “Basket Ball for Women” in the September 1894 issue of Physical Education, available courtesy of Springfield College, Babson Library, Archives and Special Collections.  She says, “The value of athletic sports for men is not questioned. It is a different matter, however, when we speak of athletics for women. Until very recent years, the so-called ideal woman was a small waisted, small footed, small brained damsel, who prided herself on her delicate health, who thought fainting interesting, and hysterics fascinating. Wider and more thorough knowledge has given us more wholesome and saner ideas.”

Digital Commonwealth and other archives and libraries have helped to preserve and provide access to documents, images, and audio and video files related to women in sports. One example is the audio file for a lecture given at UMass in 1978 by Wilma Rudolph, bronze medalist in 1956 Olympics and three-time gold medalist in 1960. At the time of the lecture, she had just published her autobiography, Wilma, and hearing her story in her own voice is inspirational. In the audio file, she speaks of her upbringing as the 20th of 22 children in small-town Tennessee. As a child, the fastest woman in the world had survived pneumonia, scarlet fever, and polio, and wore a leg brace for much of her early life.

The challenges that Wilma Rudolph had to overcome were many. She graciously gave credit to the family members, friends, fellow athletes, and coaches who helped her along the way. As she tells her story, she says that there came a point when she had to have faith in herself in order to reach her full potential.

Wilma Rudolph at the finish line during 50 yard dash at track meet in Madison Square Garden (1961), Wikimedia Commons.

Wilma Rudolph was a world class athelete before Title IX was signed into law. She had to make her way on her own and with the support system that she was able to construct without the benefit of the law enshrining women’s rights.

Title IX of the Education Amendments of 1972 (“Title IX”), signed into law on June 23, 1972, was designed to prohibit discrimination on the basis of sex in education programs and activities in all public and private elementary and secondary schools, school districts, colleges, and universities receiving any Federal funds .Title IX has broader implications than just creating a level playing field for women athletes. But in the years since the law was passed, untold opportunities have opened up for women in sports.

Women’s Sport Foundation, “Chasing Equity: The Triumphs, Challenges, and Opportunities in Sports for Girls and Women” (2020), p 13.

The implementation of Title IX has had a rocky road. It was not clear in the original law exactly how educational institutions would balance spending for men’s and women’s athletic programs. Universities with men’s football and men’s basketball programs that were spending and generating vast sums of money felt threatened by the law. Digital Commonwealth provides a link to a 1979 MacNeil/Lehrer Report on Title IX Women’s Sports. In his introduction to the half hour video file, Robert MacNeil says “many people wonder whether glamorous, big-time, big-money college sports are threatened by the drive to give women an equal share in college athletics. Tonight, sex discrimination in sports, and the debate over a law called Title IX.”

Progress has not been easy. Digital Commonwealth and its member institutions will continue to provide access to documentation of the uphill battle for equity in sports for girls and women.

Barbara Schneider, Member Outreach and Education Committee

Women’s Cross Country Race (1995)
Courtesy of Springfield College, Babson Library, Archives and Special Collections.
Image of the front page of The Woman's Era newspaper, which includes a story on and photograph of abolitionist and suffragist Lucy Stone.
The Woman’s Era, Vol. 1 No. 1, March 24, 1894

It will come as no surprise that there is widespread, urgent demand from institutions across the state to digitize historical newspapers, especially local titles that provide invaluable local coverage of daily history and titles with underrepresented perspectives and histories. There is an incredible amount of important material in need of access and preservation, and making these resources available will require a robust, sustained effort.

The Digital Services team at BPL has been working on increasing capacity for newspaper digitization and dissemination; here’s an overview of recent efforts from the last year:

Digitization at BPL

The BPL obtained a Mekel Mach 5 high-capacity microfilm scanner in March 2021, but the pandemic resulted in a significant delay with scheduling the necessary setup and training needed for operation. Mekel’s imaging technicians were finally able to help get this machine up and running in the fall of 2021, which has since been used to digitize several short runs of historically significant newspapers, including The Woman’s Era and The Tocsin of Liberty. The main current project, which is still ongoing, is scanning a major run of the Lawrence Evening Tribune (1890-1929).

While the scanning work is proceeding well (over 96,000 pages to date), imaging is only the start of any newspaper digitization project – there is significant manual work needed to collate and group the scanned pages into issue-level folders, and to identify missing pages, duplicate pages, and other anomalies.

There are also technical steps involved in processing the scanned images to create derivative files (such as using optical character recognition to extract text and word-coordinate information to support full-text searching and highlighting keyword matches on the page image), as well as developing the pipelines, workflows, and scripts to ingest the content into the digital repository. The library hopes to make significant progress on these latter steps during the second half of 2022.

Screenshot of a search box field in the Digital Commonwealth repository, with the heading "Search inside: The Tocsin of liberty"
“Search inside” view
Screenshot of the word "emancipated" highlighted in the text of a newspaper article.
Keyword searching

National Digital Newspaper Program (NDNP) Grant

Through the assistance of the Boston Public Library Fund (https://bplfund.org/), BPL was awarded a grant in September 2021 from the National Endowment for the Humanities to join the National Digital Newspaper Program (https://www.loc.gov/ndnp/), a long-running effort coordinated by the Library of Congress to build and maintain a free online digital library of historical newspapers from all U.S. states and territories. During the last few months, Digital Services staff has been working with an advisory committee of scholars and experts to identify significant newspapers from the library’s microfilm archives for inclusion in this national collection, which will then be digitized and made available via Chronicling America (http://chroniclingamerica.loc.gov/), which provides access to over 18 million pages from over 6,000 newspaper titles published from 1777 to 1963, and in Digital Commonwealth. The project, which will run until October 2023 and produce 100,000 pages of scanned newspaper content, is currently nearing the end of the title selection process, with imaging scheduled to contracted out to a digitization vendor in the fall of this year.

MyHeritage & Boston Neighborhood Newspapers

In 2016 BPL established a partnership with MyHeritage to provide access to BPL-held microfilm for digitization and display on their online genealogy platform, with the condition that BPL will receive a copy of all digitized page images produced. To date, this partnership has resulted in the digitization of approximately 7.5 million pages from a wide variety of Massachusetts newspapers spanning the late 1700s to the mid 1900s. However, the deliverables include the image scans only, and not any of the derivative files required to support discovery and display in Digital Commonwealth (see the “Digitization at BPL” section above). Producing the necessary derivative files at this scale will require additional capacity and funding support.

To evaluate the logistics, costs, time, and effort needed to ingest the MyHeritage-digitized materials into Digital Commonwealth, BPL is currently undertaking a pilot project using a vendor specializing in newspaper digitization to process a subset of these titles, highlighting Boston’s neighborhood newspapers. The titles selected for this project span from the mid-1800s to mid-1900s, representing many newspapers that currently have limited online availability, including the Roxbury Gazette, Hyde Park Times, East Boston Free Press, South Boston Gazette, Charlestown News, and the Dorchester Beacon, to name just a few. This project will produce approximately 170,000 pages of content; processing is scheduled to be completed by the end of June, and the goal is to integrate this content into the repository ingest workflow in the latter half of the year.

Looking Forward

The projects described above will no doubt provide increased access to historical newspaper content, but to make a significant impact, these activities need to become part of a curated, sustainable program with dedicated funding, equipment, and staff. The BPL is committed to continuing participation in Library of Congress’s NDNP program, which can be renewed every two years. The Digital Services team is also actively investigating other ways to increase capacity, including grant programs, advocating for more funding from the state legislature, adding staff to help manage digitization projects, and providing guidance to institutions that want to take on their own digitization projects. As with all things Digital Commonwealth, collaboration will be key to success!