Tuesday, July 28, 2009

Google Book Settlement Video and Discussion

Harvard's Beckman Center for Internet and Society hosted a presentation and discussion about the proposed Google Book Settlement which included Alex Macgillivray and Dan Clancy from Google (both are introduced at the start of this video):



The video is over an hour long but in listening to it I took the following notes. If something is not clear, best watch the video. (Also, don't take my notes as gospel watch the video).

Google's Alexander Macgillivray on the Google Book Search Settlement

AM: Google Book Search: Why did we do it? “To make books easier to find”
First lesson learned about book search: full text search is really powerful and harnessing this is really powerful.

Three places to go to build a full text database:
1. Born digital books
2. Books less new but owned by publishers: can find them
3. Not currently held by publishers or rights unclear and public domain books. Where rights were unknown recognized these are still useful and wanted to include them as full text searchable and also enable someone looking for them to know where to go to get them

AM Referred this as following as “Books 1.0”

Deals with libraries to scan books and index them.

10+ million books scanned
1.5mm in public domain
1.5mm in the Partner Program from 25K partner program and 40 libraries across number of countries

Continued to scan ‘at pace’ and didn’t stop in the face of the lawsuits came in 2005:
2 US Lawsuits:
Broad Class action: Authors Guild
Narrower: Publishers
1 French
1 German – subsequently withdrawn when looked like they would loose

Conversation in settlement: Only time happened to me at Google where the other side was thinking bigger than we were. Started thinking about doing things with the class that would provide enormous benefit. “actually increasing access to the information” saw an opportunity that “once you found the book you could actually read it”. Wasn’t a lot of disagreement around the room. Also, how do we preserve the place of the library in this environment?

Opens up access in various ways:

1. Consumer Access:
Ability get free full text search results, find it for sale or in library. Also if it is out of print (essentially all out of print books) you can get 20% of the content to sample and determine if this is the book you are looking for. Which books are useful to you and expands options to access: amazon, alibris, etc. Buy online access to the book: Lasts forever and no ‘1984 Amazon’ problem and sits forever sits on your bookshelf. Priced by the publisher or rightsholder. If none exists the price is set by an algorithm. (Simulates a market which prices the book at a price it would be if there was a market).

DC On pricing for Books: An algorithm has been built to determine the best/appropriate price for books where price not set by a rightsholder. Initial distribution is as follows but real experience will change these prices.

80% of prices are $15 or less
50% of prices are $5.99 or less

“Really think the prices will go down”

2. Institutional Access: Subscription based and pricing governed by the agreement which states pricing should offer a “fair return and broad access”

Comment: to users of an institutional license this may be akin to ‘free like water’ for all the users (or those who have access) to the institutional license.

Another comment on the institutional license:
For the entirety of the subscription not book can be removed from the collection. No 1984 problem. Once have subscribed to a set of books these can’t be removed for the entirety of the subscription. Next year there could be a different set of books which changes the composition of the license.

3. Public access model: Can go to a public library will have access to the entire ‘subscription for free’. All out of print books available at any library that wants it. Google would like it so that you “never have to worry that the amount of money you have will determine access – either in Academic or Public setting. So don’t have the money to go to Harvard but would be able to gain access to this material and the content of all the other libraries.

One terminal in every library (hope over time to be able to provide more access points for public libraries)

Obviously in addition all public domain titles will be available via the internet

AM Also notes the ability of the agreement to expand access to those with disability – especially those with print disabilities (the blind).

Professors can now select from a much wider universe/set of books: moving from a relatively small set of titles to a much more inclusive set


Orphan Works: - Notes blog posts.

Google has been fighting for Orphan works legislation for years that would allow for mass digitization projects (including but not exclusive to books)

Still think this effort is important for a number of reasons:
Settlement includes Orphans and non-Orphans
No clear cut definition as to what an Orphan is
Constant problem in Washington and disagreement: ever competing definitions within groups even within cohesive groups

“Works where the rightsholder is very very hard to find.”

May be copyright holder out there but the connection between (me) and the holder is hard or can’t be made

Clancy: Books have some advantage over other intellectual works because authors name, publishers name (other info), is printed in the book. Many of these books have publication information.

Not just books: images, physical objects, other things but even harder to find copyright holder.

More scholarly books from libraries: Professors at the university at the time of publication.

“Can find them – little hard but could if you tried. These are not really Orphans”
For many casual uses finding them for class use (or for permissions) is not too difficult. Noted the Author’s Guild research asking their authors whether finding copyright holders for permissions: ‘Success 90% of the time. (PND Note: I think % is higher than actual but not by much). These books aren’t really Orphans is just a little hard to find the rightsholder.

Challenges: Books less of an issue but still an issue for some percentage of the titles:
1. Lots of books that aren’t Orphans but still a bit of a pain to go ahead and find who the rightsholder.
2. Because of statutory risks in copyright titles may be ‘practically dead in the marketplace’ because the economic value is small versus the costs of getting hold of the rightsholder and getting the title authorized. Has to do with rightsholder indemnifying the seeker of the rights against a future claim. Money rightsholder receives in this transaction is much smaller than his/her economic risk of error if they don’t in fact retain the rights to the work.

AM: Addressing the twin problems of Orphan works
1. making it easier to find rightsholder
2. makes these things (cultural items) themselves accessible

AM: Make really clear (w/r/t Orphan works legislation) inserted clause that Orphan works legislation will trump the settlement.

DC: Important point that all information is freely and publicly available as to the disposition of the copyright:
Who claims what book is public information
Can also ask “Tell me which books have not been claimed”

AM “Fact that this information is public is really an important part of the agreement” J – compares this openness with other rights distribution agencies which are closed. Keep as private which content is part of their collection.

“BRR is unable to be obscure about rightsholder information”

Question of ‘fair use doctrine’: isn’t this the end of fair use?

AM Currently have more fair use cases than anyone else.
Continue to be subject to lawsuits with respect to photos and foreign works. Google is never on the plaintive side in Fair use cases. Always on the defendant side. “Understand it may be convenient to say we are abandoning fair use but its bull shit”

DC: Going in to the agreement we felt we would win the lawsuit: “felt pretty good”. In the agreement it was important that we did not erode fair use. We don’t believe the agreement erodes fair use and continue to conduct ourselves (scanning images, unregistered works, opted out works). All still believe in fair use. “if we felt the agreement was undermining our belief in fair use we would be adjusting our actions with respect to some of the things we are doing” (images etc.)

AM: Just to be clear: “Google built its whole business on fair use and we are not backing down from this at all” We are not backing down from this at all.

Question about where the money goes (specifically what happens to uncollected funds): “Not clear why anyone would have a claim on the collected but unclaimed money”.

AM: Two streams:
For consumer purchases the money is held for 5yrs. If unclaimed the BRR can use the (5th year) money to operate the BRR, if money left over then can use the remaining money to ‘top up’ the payments to rightsholders from the 63% to 70%, if there is any money remaining after that it is disbursed to charities.

for institutional: after 5yrs registry operating costs, any remaining left over is divided across the rightsholders in the institutional license

Heard people say the money shouldn’t be divided this way because BRR etc have no right however; there is no consistency on where the money should go. Different groups have different ideas as to where/how the dollars are divided. The way the settlement distributes it is similar to other rights organizations; however, the settlement also says that if there is Orphan works legislation this will trump the settlement. “You can easily get a resolution to the extent you can get all the other constituents to agree” on where the money could go.

Question about the research corpus: Largest collection of ‘parellel corpa’ with respect to translation. Who’s got access to it?

DC: Right now in the current world Google has access to the entire database. Because of the current copyright we can’t open it up to everyone to come in and do what they want. Secondly, each library only has access to their collection. Each partner has a subset. Google has the whole thing.

Creation of a research corpus for non-consumptive research allowing for computational research on the entire corpus. Word usage, Machine translation, OCR, New search technologies over large texts like books

Participating and fully cooperating libraries get to create up to 2 of these research corpus’. Google is putting up $5mm to set up these research corpus’

Up to the libraries to use these research projects: has to be non-consumptive research. Libraries have the responsibility but can sponsor anyone they want. They have responsibility to secure the corpus. Can sponsor any university or person they want.

31 partners and most are expected to come on: could be another 50 or 100. Any of the libraries can sponsor others.

Michigan is the only one doing anything: Something with Hathi Trust
Looking to build one corpus on public domain stuff and working with them on this. Google want them to get going because once get it going ‘they will discover things’ which will make the research opportunities more tangible.

AM Absent the settlement this doesn’t happen. Once settlement approved we get to provide all the content

Question about Competition: Specifically most favored nation clause. A suggestion this removes any incentive for a competitor to enter this market because they can never ‘beat’ Google:

AM Stated the clause without the limitations:
Only for first 10 yrs. Deal is long and the first mover is taking on a lot of up front risk. First mover deal for the length of copyright of the last book in the database by definition is a long time. Scanning and the $125mm in the settlement addition points.

Second limitation: Only to the extent that a deal with a third party impacts a significant number of unclaimed (other than registered rights holders) works (slightly bigger than Orphans) will the clause be relevant.

AM: This clause is regarded by anti-trust as a ‘good thing’: Very easy for a second entrant with the blue print (via BRR) of a deal already done. Wanted to ensure that for the first 10 yrs that Google could complete with any entrant be they Amazon, MS, or other. Anti-trust views this as a good thing because it encourages the type of innovation we have with this settlement.

1 comment:

Michael W. said...

Thanks for posting the link to the video and for the description and quotes. This remark stuck me as particularly hilarious:

"Google would like it so that you “never have to worry that the amount of money you have will determine access – either in Academic or Public setting. So don’t have the money to go to Harvard but would be able to gain access to this material and the content of all the other libraries."

Over and over again I find myself asking "Are Google executives lying or are they just stupid?" What they're talking about is true now and without them doing a thing. As a student and now as a citizen I have used academic and public libraries to get, almost always for free, books via interlibrary loan. The difference between that and Google's scheme is, that through interlibrary loan I'm getting a legitimate copy for which the author received royalties rather than a digital copy for which the author isn't getting a penny.

I don't care that Google is greedy. I just wish they'd quit posturing as virtuous. For proof, notice how carefully they are to obey China's censorship laws down to the slightest nuance and compare that to how caviler they are with the copyright laws and treaty obligations of some 160 countries. Google's 'concern' for impoverished researchers is no greater than that it has for free speech in China. When it comes to China, they want to make money in the world's most populous country. And when it comes to Google Book Search, they're only interested in those who have the money to spend on their ad links. Money, money, money. That's what this is all about.

It is true that most of their services are legitimate. They develop software such as Google Mail that others are allowed to use and they link to web pages that people have posted online. But in neither of those cases is Google profiting illegally from someone's copyrighted material. Those who don't want to be read for free online don't post online. Google doesn't illegal scan their hard drives like they're illegally scanning books. Unfortunately, Google seems unable to see this clear distinction.

An analogy helps. What Google is doing is a bit like congratulating yourself for running a theater that shows films for free, so 'access' doesn't depend on the ability to pay, But this Google Theatre doesn't pay those who made the film a penny while it profits (or hopes to profit) enormously from selling the popcorn and drinks that accompany the film.

That's Google.