Thursday, August 16, 2007

Metadata and Google Booksearch

Peter Brantley (on O'Reilly) blogs about a journal article by Paul Duguid on the quality (representation) of Google Book Search. It is not good. Peter's blog is worth reading in its entirety but I was interested in what he said about metadata:
However, Duguid's analysis of Google Book Search is far deeper than a consideration of the cosmetic defects of the books' electronic skin. Rather, he
recognizes that faults lurk so visibly because Google is throwing away information that are fundamentally characteristic of books -- metadata that describe and even determine what books are, as simple and trivial as volume numbers, or artifacts of type design, editing, and artistic production. Books are not, in other words, mere bags of words, but vehicles in which ride a wide sundry of other passengers -- metadata, artistic expression, whimsy, and error.

I have long believed that the sheer explosion of information makes consistently constructed bibliographic databases like WorldCat more valuable than less. What I don't understand in the Google Book Search production process is where the connection between the call number and the book broke down. Surely, detailed metadata exists for these titles in the library catalogues from which the books emanate. Admittedly not all the physical characteristics that Peter notes but perhaps the catalog record is a starting point. How hard will fixing the broken synapse be? Ironic to think the full text exists in an electronic database but can't be recognized.

No comments: