Graton & Knight  Flickr | Internet Archive
Graton & Knight
Flickr | Internet Archive

Many of these books in the Internet Archive include photographs, pictures, maps and other images but since the books are treated primarily as text, there has never been an easy way to find them.

Captain George M. Whipple Flickr | Internet Archive
Captain George M. Whipple
Flickr | Internet Archive
Kalev H. Leetaru researches the ways big data can be used in the study of human society. While serving as Yahoo! Fellow in Residence at Georgetown University, Leetaru came up with a way to liberate these images from the pages of the books, and post them as a searchable image collection.

When books are scanned for the Internet Archive, they are run through an Optical Character Recognition (OCR) program to make the text searchable. The software identifies and skips over the parts of page that are not text. Leetaru was able to go back to the original scans, capture those skipped areas, convert them into image files, and upload them to Flickr along with the bibliographic information about the book and the text surrounding the image, along with a link to the page on the Internet Archive site to present the image in context.

Over 14 million images were created, over five million of which have already been processed by been uploaded to the to the Internet Archive Book Images photostream on Flickr. Because of the limitations of OCR and of automated processing, searching can be a challenge at times — these images don’t have the kind of cataloging and description that participants in the Digital Commonwealth strive to create. Crowdsourcing helps somewhat here as other Flickr members can add tags to any of these images.

But even with these limitations, this enormous collection images is a great resource for anyone doing research in family, local or cultural history. Images from the Internet Archive are in the public domain and can be freely used, and they can also help librarians, archivists and others identify images of people and places in their collections.


Comments are closed.