Thursday, April 22, 2010

A Database of Riches: The Business Model Behind The Google Book Settlement

Late last year, I wrote an analysis of the possible size of the orphan work issue that was core to the initial criticism over the Google Book Settlement. That analysis was a subset of a wider analysis of the business opportunity that the Google Book Settlement represented once the database was cleared for sale.

Following is that further analysis: In this report, I have estimated the market opportunity that the Google Book database could represent and I have organized my review based on which customers are likely to purchase the product, how much and to what degree customers will purchase and I also explore how Google might go about selling and marketing the product. I have excerpted the management summary section below and the full report is available in pdf here. (It is approximately 20 pages).

Anyone interested in discussing this report in more detail is encouraged to set up a conference call or meeting with me and I can review in more detail my methodology and explore the options available to Google as they roll out this product.


Almost five years ago, Google embarked on the most ambitious library development project ever conceived: To create a “Noah’s Ark” of every book ever published and to start by digitizing books held by a rarefied group of five major academic libraries. The immediate response from US publishers was muted, until the implications of the project became clear: That Google proposed no boundaries to the digitization effort and initiated the scanning of books both in and out of copyright and in and out of print. Adding to publisher’s concerns, Google planned to display “snippets” (small selections) of the book’s content in search results. Despite some hurried conversations among publishers, author groups and Google, Google remained convinced that what they were doing represented a social ‘good’ and the partial display of the scanned books was legally within the boundaries of fair use.

From the publisher perspective, this was a make-or-break moment, and the implications were more acutely felt by trade publishers who saw the potential for their business models to be obliterated by easy and ready access to high-quality content via a Google search over which they would exert little or no control. Even worse was the fear that rampant piracy of content would also develop – a debated and contentious point - given the easy access to a digitized version of a work that could be e-mailed or printed at will. The publishers determined that if Google were to ‘get away with it’ without challenge, then anyone would be able to digitize publisher content and possibly replicate what has been going on in the music and motion picture industries for almost ten years. In mid-2005, prompted by a law suit filed by The Authors Guild, the Association of American Publishers (AAP) led by four primary publishers filed suit against Google in an effort to halt the scanning of in-copyright materials. (The Authors Guild and AAP ultimately combined their filings).

The initial Google Book Settlement (GBS) agreement, given preliminary approval by a court in October 2008, generated a vast amount of argument both in support of the agreement and in challenges to it. A revised agreement was drafted after the Federal District Court of Southern New York and Judge Chin agreed to delay the adjudication and final arguments which were heard in late February 2010. To date, Judge Chin has not given a timetable nor an indication of when and how he will decide the case.

From the perspective of the early leading library participants, Google’s arrival and promise to digitize their purposefully conserved print collections looked like a miracle. Faced with forced declines in the dollars spent on monographs and the ever-rising expense of maintaining over 100 years of print archives, the Google digitization program provided a possible solution to many problems. All libraries believe they hold a social covenant to collect, maintain and preserve the most relevant materials of interest to their communities but maintaining that covenant becomes a challenge in an environment of increasing expenses while also enduring the challenges of migrating to an on-line world(1).

The library world is typically segmented into public and academic institutions and while these often varied ‘communities’ may differ in their philosophy towards, for example, collection development or preservation, they do share some common practices. Most importantly, all libraries are committed to resource sharing and while materials use has historically and primarily been ‘local’ to the library, every institution wants to make its collections available to virtually any patron and institution who requests them. In short, these library collections were always ‘accessible’ to all regardless of geography or copyright: First US Mail, FedEx, e-mail and then the Internet progressively made this sharing easier but, until Google arrived with their digitization program, any sharing beyond the local institution was via physical distribution(2) . In effect, it could be argued that the Google scanning program simply makes an existing practice vastly more efficient.

Even though, the approval of the Google Book Settlement (GBS) hangs in the balance under review by Judge Chin of the Federal District Court of Southern New York, an Executive Director has been named to head the Book Rights Registry (BRR) (3) and is preparing the groundwork to establish the organization (BRR) in advance of approval. This report represents an attempt to analyze the market size opportunity for Google as it seeks to exploit the Google Book Settlement.

Following are our summary findings which are discussed in more detail in the ensuing pages of this report.

Summary Findings of the Report:
  • Libraries will see tremendous advantages – both immediate and over time - from the GBS, although concerns have been voiced (notably from Robert Darnton of Harvard)(4)
  • Google’s annual subscription revenue for licensing to libraries could approach $260mm by year three of launch
  • Over time, publishers (and content owners) will recognize the GBS service as an effective way to reach the library community and are likely to add titles to the service(5)
  • Google will add services and may open the platform for other application providers to enhance and broaden the user experience
  • The manner in which the GBS deals with orphan works will provide a roadmap for other communities of ‘orphans’ in photography, arts, and similar content and intellectual property
[1] It is important to acknowledge that, initially, the GBS may have been seen as a solution to libraries’ conservation and preservation needs; however, subsequently, libraries have determined that they need to develop their own preservation options in which The Hathi Trust is a clear leader.
[2] Resource sharing and improvements in the ‘logistics’ provided by OCLC (WorldCat) or via consortia such as OhioLink has made physical distribution effective and comparatively efficient.
[3] The BRR is the management body tasked with administering the GBS and representing the interests of authors and publishers once approval has been granted by the court.
[4] Robert Darnton, NY Review of Books
[5] The settlement doesn’t provide for adding content prior to 1/5/09; however, we are suggesting that, by mutual consent, additional published content may be added as an expedient method of reaching the library market.


Inkling said...

A few comments are appropriate:

First, there's the claim that Google was creating "high-quality content." That's not a word any researcher who has looked at Google Books would use. What's displayed are scanned images badly OCRed with no wider context provided. It illustrates a geeky obsession with quantity over quality. And the greater the size of the collection, the more poorly a "pick the magic words" search will work.

Also, because the settlement was about US copyrights only, in copyright books aren't displayed to anyone with a non-US IP address and what is displayed in the US has many limitations including the number of pages that can be viewed. Also, as you hint in passing, additional material like graphics, photos, charts and material written by someone other than the book's author are supposed to be blanked out. The combination makes the collection almost worthless for anything other than grabbing quick quotes.

Next, "The initial Google Book Settlement (GBS) agreement, given preliminary approval by a court in October 2008, generated a vast amount of argument both in support of the agreement and in challenges to it."

That's not true. Did you bother to read the submissions? I did and even contributed one letter to the court. The arguments ran about six-to-one against the agreement and were soundly argued and detailed. The ones in favor were few, brief and generally were along the lines of "Gee, free books. Yeah!" That's not my opinion. It's simply a fact, obvious to anyone who read the submissions.

You need to look into this dispute more carefully before you begin to make pronouncements like those above.

Eric Hellman said...

To put the $260 million number in context, it is estimated that US publishers sold about $1.6 billion of books to libraries in 2009. Public libraries spent $934 million on print books and serials in 2007. I very much doubt that libraries would spend 16% of their acquisition budgets on the institutional subscription; one director of a major research university told me that his institution is unlikely to be interested in the institutional subscription. $50-100 million seems more plausible to me.

I wrote an article on the size of the library book market in January, see

To a large extent, library spending is a zero-sum game for publishers. If Google DOES manage to gain a quick 16% market share, other players in the market should count on a 16% drop.