Friday, December 05, 2008

Amazon Rents Massive Data Sets

Several months ago, I wrote a piece titled Massive Data Sets about the potential for increased access to very large data sets that historically would have remained ancillary to the reporting of research projects. While very important to the conclusions reported in research typically the data and primary research supporting the conclusions was inaccessible. From my post,
here could develop the next land grab for publishers and perhaps other parties interested in gaining access to the raw data supporting all types of research. As publishers develop platforms supporting their publishing and (n0w) service offers will they see maintaining these data sets as integral to that policy? I believe so, and I suspect in agreements with authors, institutions and associations that own these journals the publishers like Elsevier will also require the 'deposit' of the raw data supporting each article. In return, the offerings on the publisher's 'platform' would enable analysis, synthesis and data storage all of benefit to their authors. But the story may be more comprehensive than simply rounding out their existing titles with more data.
The original was triggered by an article on a Google blog post as well as a NYTimes article.

Yesterday, the NYTimes blog Bits reported that Amazon has begun hosting large data sets as an adjunct to their services offering. From the Times,
Amazon Web Services, a subsidiary of, has started offering access to large collections of data. Business customers and scientists can take the information, which ranges from census databases to three-dimensional chemical structures and the genome, and use it as the basis for computing jobs. By gathering and storing the information, Amazon says that it can save businesses the step of assembling and managing data on their own.
As the blog post goes on to say, there is the potential that the Amazon service can further eliminate (on top of the vast array of services Amazon already offers) significant expenses. Access to the Amazon service begins to push to zero the infrastructure cost and overhead that must be covered in any research project. This could have a material impact on the types and extent of all research dependent on the collection, storage and analysis of vast data sets. The economics have fundamentally changed for researchers enabling them to contemplate all kinds of new projects that otherwise may have been cost prohibitive. On the other hand, their research limitations could be more mundane in that they may no longer need to compete for data processing time or other technical limitations with competing projects.

Smart people are going to see an opportunity to buy or otherwise gather very large sets of data from groups or organizations who may not see the potential value. For example, buying the transaction data from all the EasyPass-like systems (RFID tags that let you pass through tolls) across the US, 'depositing' it with Amazon and then renting access to any urban planner that wants to analyze the info. The customer pays a fee and out of that fee the 'owner' of the data pays Amazon a service fee. A potentially painless way to an early retirement in Costa Rica. As I noted in my original post, this is a growth opportunity for publishers or others.

