Tuesday, October 14, 2008

That Big Database in the Sky

Are we getting ever closer to that large digital repository in the sky that enables any library patron to access the collections of their own and all other libraries from their computer? Perhaps. I believe this concept will come to represent a major element in the changed definition of 'library.' We are many years off that eventuality however, this new initiative may provide a glimpse of the opportunity.

Something named the HathiTrust has announced a large scale digital repository initiative in collaboration with 13 charter member research libraries. From the press release,

A group of the nation’s largest research libraries are collaborating to create a repository of their vast digital collections, including millions of books, organizers announced today. These holdings will be archived and preserved in a single repository called the HathiTrust. Materials in the public domain will be available for reading online.

Launched jointly by the 12-university consortium known as the Committee on Institutional Cooperation (CIC) and the 11 university libraries of the University of California system, the HathiTrust leverages the time-honored commitment to preservation and access to information that university libraries have valued for centuries. UC’s participation will be coordinated by the California Digital Library (CDL), which brings its deep and innovative experience in digital curation and online scholarship to the HathiTrust.

“This effort combines the expertise and resources of some of the nation’s foremost research libraries and holds even greater promise as it seeks to grow beyond the initial partners,” says John Wilkin, associate university librarian of the University of Michigan and the newly named executive director of HathiTrust. Hathi (pronounced hah-TEE), the Hindi word for elephant incorporated into the repository’s name, underscores the immensity of this undertaking, Wilkin says. Elephants also evoke memory, wisdom, and strength.

As of today, HathiTrust contains more than 2 million volumes and approximately ¾ of a billion pages, about 16 percent of which are in the public domain. Public domain materials will be available for reading online. Materials protected by copyright, although not available for reading online, are given the full range of digital archiving services, thereby offering member libraries a reliable means to preserve their collections. Organizers also expect to use those materials in the research and development of the Trust.

Volumes are added to the repository daily, and content will grow rapidly as the University of California, CIC member libraries, and other prospective partners contribute their digitized content. Also today, the founding partners announce that the University of Virginia is joining the initiative.

Each of the founding partners brings extensive and highly regarded expertise in the areas of information technology, digital libraries, and project management to this endeavor. Creation of the HathiTrust supports the digitization efforts of the CIC and the University of California, each of which has entered into collective agreements with Google to digitize portions of the collections of their libraries, more than 10 million volumes in total, as part of the Google Book Search project. Materials digitized through other means will also be made available through HathiTrust.

HathiTrust provides libraries a means to archive and provide access to their digital content, whether scanned volumes, special collections, or born-digital materials. Preserving materials for the long term has long been a mission and driving force of leading research libraries. Their collections, accumulated over centuries, represent a treasury of cultural heritage and investment in the broad public good of promoting scholarship and advancing knowledge. The representation of these resources in digital form provides expanded opportunities for innovative use in research, teaching, and learning, but must be done with careful attention to effective solutions for the curation and long-term preservation of digital assets.

This is an initiative worth following closely. The collaboration between libraries looks like the single most important differentiator compared with other digital initiatives particularly the Google digitization program. Hathi's application to the in-copyright world should be closely watched. As this program matures I expect the Trust will seek licencing terms that enable on-line viewing for in-copyright materials. Currently, the application is for preservation only.

Anonymous said...

since this is a reworking of umich stuff,
i expect end-users will have to deal with
their clumsy one-page-at-a-time viewer,
instead of getting public-domain books
in their entirety in a single-file, meaning
they've seriously compromised the utility.