Thursday, October 18, 2007

Identifying My Package

As publishers we remain committed to defining for our readers and users the ‘package’. At the Frankfurt supply chain meeting last week as I listened to another “history of the ISBN” and other bedtime stories I was stuck by our insistence as publishers to define for our customers just how they should consume our content. This was manifested in our approach to identifiers for segments of content. I include myself in this criticism as a proponent of ISBN, DOI, ISTC and other alphabet defying groupings over the past 10 years. Three or more years ago, I think we were on the right track but in today’s user defined world the consumer is telling us what parts they want to consume and we will need to come up with easy to use flexible solutions that can identify the content and use.

On the Exact Editions site a user can select, by highlighting, a piece of text they want to use from any number of the journals and magazines hosted by EE. (The tool is named The Clipper). It is a fun and useful tool but in its implementation it doesn’t restrict the user in any way (other than a limitation on the amount of content). If a similar solution were implemented in a research context (within Refworks for example) I would like to see a persistent identifier created on the spot who’s syntax could be partially defined by the user. This is a perfect implementation for a DOI (one of the few perhaps) that enables the user to select a segment of the content they want, makes it persistent, creates a record for the publisher and enables any necessary reporting to take place.

It would seem to me that formatting a programmatic standard syntax to represent paragraphs, chapters, images etc. is a backwards approach simply because we will never fully anticipate how our users will use the content. We also continue to use the printed page as a construct which is fast diminishing in the online context and further undercuts the current standards approach. Attempts to build out a standard by unilaterally assigning executable identifiers to works (books) will be a waste of time and I simply don’t see the benefit of this approach; moreover, I don’t see anyone paying for it. It is not even clear publishers would welcome this approach.

Several implementations of technology that places at the point of need an easy to use script has proven that users want and are willing to purchase or gain approval for the use of content. CCC and O’Reilly are two differing examples of this concept. In the same manner, enabling an easy to use [citation] solution that provides a user with a simple pop-up window tied to the content they are interested in is a far more flexible and appropriate solution to identifying content. Avoid proscriptions: Let the user decide.


Adam Hodgkin said...

There probably is a profound philosophical/bibliophilosophical disagreement between us, since Exact Editions is grounded on the view that printed pages are a very important construct. Its the way the web is going to absorb the wisdom of print (Google Book Search takes a similar approach, so we are not alone in our paginated granularity). But leaving that aside, and whilst denying that we seek to *restrict* the reader, we do, in the way that the Clipper works, constrain the reader/author in a fundamental way. The clipping which carries a Quotation, also with the Exact Editions tool, carries a Citation (note the Page information and the title attribution which appears in the frame of the clipping). So you could find the quotation in the printed work even if the original web edition was not available or you could only lay your hands on a pdf version of the issue. "No quotation without citation" is the principle at work. Google have introduced a similar clipping tool, but their citation data is much vaguer (merely to the volume/title, with no information on page location). This citation meta data in the Exact Editions tool is rather precise and yet its persistence is certainly dependent on the survival of the web page or the blog in which it occurs. But that is the way the web works.

MC said...

"bibliophilosphical"? Should we have our business cards changed to Bibliophilosopher?
"restrict" was the wrong word to use - apologies.

My main point is more about attempts to rigidly apply a proscribed syntax (as defined by standards bodies) to parts of content. For example, for a publisher to applying a doi to a chapter may be a wasted excercise. First and foremost it may never be used so the effort and the resources to create this are wasted. Secondly, the reference to that content will have more relevance to the user so let them define the reference. The publisher will want to know about the contents' use and context but what else? To the user the content isn't the publishers' chapter 2 it is pages 6-9 of their finished document so let them call it that or 'the diagram with the suggestive curves'. Requiring them to link to the entire chapter doi places a needless burden on both the publisher and the content user.

BTW: Reference to EE was not intended to be at all critical; rather the opposite.

Adam Hodgkin said...

Yes. I agree that new standard-governed DOIs have a rather minimal role to play, and its not just that users may have unforseen classificatory needs. Just as important, content aggregators and web services will evolve which can develop, in principle unforseeable, structural systems/applications. See the way Flickr has informally organised and tagged photographed images (YouTube -- videos).

BTW did not take the original posting as critical, my response was more a matter of clarification and of emphasising the usefulnees in the web context of existing literary structure (pagination and ISBNs included).