Publishers have elected to sue Google to protect their content rights and the content rights of their authors. At the same time, publishers have engaged with Google in participation in the Google Scholar program. Here publishers are equal partners and (I assume) negotiations for the acquisition of content by Google was negotiated in good faith and the results have been good to great for both parties. (Springer, Cambridge University). It is also no bad thing that Google’s content (digitization) programs have spurned other similar content initiatives particularly those of some of the larger trade and academic publishers.
The continued area of friction is the digitization project that Google initiated to scan all the books in as many libraries willing to participate. This is where publishers got upset. They were not consulted nor asked permission, they cannot approve the quality of the scanning, they will not participate in any revenue generated and they can not take for granted that the availability of the scanned book will not undercut any potential revenues they may generate on their own. The books in question are the majority of those published after 1925 or so (It's actually 1923: thanks to Shatzkin for noticing my error) and which are still likely to be under copyright protection of some sort.
Having said that, lets get one thing straight; having all books which exist in library stacks (or deep storage) available in electronic form so that they can be indexed, searched, reassembled, found at all and generally resourced in an easy way is a good thing and an important step forward and opportunity for libraries and library patrons. Ideally, it would lead to one platform (network) providing equal access to high quality, indexed e-book content which any library patron would be able to access via their local library. Sadly, while the vision is still viable the execution represented by the Google library program is not going to get us there.
Setting aside the copyright issue, the Google library program has been going on now for approximately 24mths and results and feedback is starting to show that the reality of the program is not living up to its promise. According to this post from Tim O’Reilly, the scans are not of high quality and importantly are not sufficient to support academic research. Assuming this is universally true (?), the program represents a fantastic opportunity lost for patrons, libraries and Google. BowerBird via O’Reilly states:
umichigan is putting up the o.c.r. from its google scans, for the public-domain books anyway, so the other search engines will be able to scrape that text with ease. what you will find, though, if you look at it (for even as little as a minute or two) is that the quality is so inferior it's almost worthless
Ironically, the lawsuit by the AAP could actually have a beneficial impact on the process of digitization. As some have noted, we may have underestimated the difficultly in finding relevant materials and resources once there is more content to search (this assuming full text is available for search). Initiatives are underway particularly by Library of Congress to address the bibliographic (metadata) requirements of a world with lots more content and perhaps the results of some of these bibliographic activities will result in a better approach to digitization of the more recent content (post 1923). Regrettably, some believe that since there may be only one opportunity to scan the materials in libraries that we may have lost the only opportunity to make these (older) materials accessible to users in an easy way.
Tomorrow, just what is the universe of titles in the post 1923 ‘bucket’? The supporters of the Google project speak about a universe of 30million books but deeper analysis suggests the number is wildly exaggerated.