Showing posts with label Bibliographic. Show all posts
Showing posts with label Bibliographic. Show all posts

Tuesday, September 29, 2020

OCLC's Vision for the Next Generation of Metadata

From the OCLC report summary: 

Transitioning to the Next Generation of Metadata synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the “next generation of metadata.”
The firm belief that metadata underlies all discovery regardless of format, now and in the future, permeates all Focus Group discussions. Yet metadata is changing. Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff.
This report considers: Why is metadata changing? How is the creation process changing? How is the metadata itself changing? What impact will these changes have on future staffing requirements, and how can libraries prepare? This report proposes that transitioning to the next generation of metadata is an evolving process, intertwined with changing standards, infrastructures, and tools. Together, Focus Group members came to a common understanding of the challenges, shared possible approaches to address them, and inoculated these ideas into other communities that they interact with. 
Download pdf

Sunday, May 13, 2018

Publishing Technology Survey Request: How strategic is technology spending?

Please take some time to complete this technology survey. Results will be published here. I encourage you to forward this survey to your colleagues as I would like as many responses as possible.  Additionally, if you have email lists and/or newsletters and want to include a link to the survey you can also use this link:  https://www.surveymonkey.com/r/PubTechSpend  THANK YOU!
  Create your own user feedback survey

Thursday, May 03, 2012

Corporate Data Strategy and The Chief Data Officer

There were several discussion points around data at today's BISG Making Information Pay session and I was reminded of a series of posts I published last September about the importance of having a data strategy. Here are is the first of those posts with links at the bottom for the other three articles in the series.

Corporate Data Strategy and The Chief Data Officer

Are you managing your data as a corporate asset? Is data – customer, product, user/transaction – even acknowledged by senior management? Responsibility for data within an organization reflects its importance; so, who manages your data?

Few companies recognize the tangible value of the data their organizations produce and generate. Some data, such as product meta-data, are seen as problematic necessities that generally support the sale of the company’s products; but management of much of the other data (such as information generated as a customer passes through the operations of the business) is often ad-hoc and creates only operational headaches rather than usable business intelligence. Yet, a few data aware companies are starting to understand the value of the data generated by their companies and are creating specific business strategies to manage their internal data.

Establishing an environment in which a corporate data strategy can flourish is not an inconsequential task. It requires strong, active senior-level sponsorship, a financial commitment and adoption of change-management principles to rethink how business operations manage and control internal data. Without CEO-level support, a uniform data-strategy program will never take off because inertia, internal politics and/or self-interest will conspire to undermine any effort. Which raises a question: “Why adopt a corporate data strategy program?”

In simple terms, more effectively managing proprietary data can help a company grow revenue, reduce expenses and improve operational activities (such as customer support.) In years past, company data may have been meaningless in so far that businesses did not or could not collect business information in an organized or coordinated manner. Corporate data warehouses, data stores and similar infrastructure improvements are now commonplace and, coupled with access to much more transaction information (from web traffic to consumer purchase data), these technological improvements have created environments where data benefits become tangible. In data-aware businesses, employees know where to look for the right data, are able to source and search it effectively and are often compensated for effectively managing it.  

Recognizing the potential value in data represents a critical first-step in establishing a data strategy and an increasing number of companies are building on this to create a corporate data strategy function.
Businesses embarking on a data-asset program will only do so successfully if the CEO assigns responsibility and accountability to a Corporate Data Officer. This position is a new management role and not additive to an existing manager’s responsibilities (such as the head of marketing or information technology). In order to be successful, this position carries with it the responsibility for organizing, aggregating and managing the organization’s corporate data to better effect communications with supply chain partners, customers and internal data users.

Impediments to implementing a corporate data strategy might include internal politics, inertia and a lack of commitment, all of which must be overcome by unequivocal support from the CEO. Business fundamentals should drive the initiative so that its expected benefits are captured explicitly. Those metrics might include revenue goals, expense savings, return on investment and other, narrower measures. In addition, operating procedures that define data policies and responsibilities should be established early in the project so that corporate ‘behavior’ can be articulated without the chance for mis- and/or self-interpretation.

Formulating a three-year strategic plan in support of this initiative should be considered a basic requirement that will establish clear objectives and goals. In addition, managing expectations for what is likely to be a complex initiative will be vital. Planning and then delivering will enable the program to build on iterative successes. Included in this plan will be a cohesive communication program to ensure the organization is routinely made aware of objectives, timing and achievements.

In general terms, there are likely to be four significant elements to this plan: (1) the identification and description of the existing data sources within an organization; (2) the development of data models supporting both individual businesses and the corporate entity; (3) the sourcing of technology and tools needed to enact the program to best effect; and then, finally, (4) a progressive plan to consolidate data and responsibility into a single entity. Around this effort would also be the implementation of policies and procedures to govern how each stakeholder in the process interacts with others.

While this effort may appear to have more relevance for very large companies, all companies should be able to generate value from the data their businesses produce. At larger companies the problems will be more complex and challenging but, in smaller companies, the opportunities may be more immediate and the implementation challenges more manageable. Importantly, as more of our business relationships assume a data component, data becomes integral to the way business itself is conducted. Big or small, establishing a data strategy with CEO-level sponsorship should become an important element of corporate strategy.

The following are the other articles in the series:

2: Setting the Data Strategy Agenda
3: Corporate Data Program: Where to Start?

Wednesday, February 15, 2012

File Under "Bleedin' Obvious": Good Data Drives Sales

Nielsen Bookdata recently released a white paper/sales sheet on metadata enhancement which presents some real data on the direct link between deep accurate metadata and increased sales and long term revenue.  Unsurprisingly, the document finishes by noting that BookData provides enhanced metadata services for a fee which, assuming publishers don't have the where with all to handle this very basic activity themselves, they would be well advised to contract with Nielsen (or someone similar).

It occurs to me that there's some circuitous illogical aspect to working with a third party data enhancement provider: If, as a publisher I don't have the means to provide this deep information in the first place, how will I be able to know that the deep metadata services provided by a third party are accurate and optimal?  Nielsen will say "increased sales" and they'd be correct based on their own analysis yet it is always going to be the author, editor and marketing person at the publisher who is best placed to define and optimize their metadata.  Contracting this function out is not only likely to be sub-optimal but might also result in a staff who's experience becomes removed from the realities of market dynamics.  This is not to suggest that the third party will do a bad job but that the benefits to the publisher in doing it themselves far out weighs the benefits both in the short term and long term.

And what are the results of better metadata?  Neilsen's report is quite specific from this sample:
White Paper: The Link Between Metadata and Sales

Looking at the top selling 100,000 titles from 2011ii we analysed the volume sales for titles where either the BIC Basic or image flag was missing, and compared these with titles where one of the flags were missing and titles where both the BIC Basic and image flags were present, indicating that the BIC Basic standard was met. Figure 1.1 shows the average sales per title for these four different sets of records.

The positive impact of supplying complete BIC Basic data and an image is clear. Records without complete BIC Basic data or an image sell on average 385 copies. Adding an image sees sales per ISBN increase to 1,416, a 268% boost. Records with complete BIC Basic data but no image have average sales under 437 copies, but when we look at records with all of the necessary data and image
requirements, average sales reach 2,205. This represents an increase of 473% in comparison to those records which have neither the complete BIC Basic data elements or an image. Figure 1.2 shows a direct comparison between all records with insufficient data to meet the BIC Basic standard, and those that meet the requirements.

the average sales across all records with incomplete BIC Basic elements are 1,113 copies per title, with the complete records seeing an 98% increase in average sales.

Titles which hold all four enhanced metadata elements sell on average over 1,000 more copies than those that don’t hold any enhanced metadata, and almost 700 more copies that those that hold three out of the four enhanced metadata elements. In percentage terms, titles with three metadata elements see an average sales boost of 18%, and those with all four data elements 55% when compared to titles with no enhanced metadata elements.
In the still early days of Amazon we were always throwing out the (anic)data point that a book with a cover image was 8x more likely to sell versus one without.  Sadly we are still discussing much the same issue.

Monday, December 12, 2011

OCLC Report: Libraries at Webscale

OCLC have released a report they've been working on this year looking at the impact of the web on our rapidly changing information environment.  I was asked to participate as an interviewee, which I found intellectually stimulating, and I'm looking forward to reading the full report (79 pages).

Here is a summary of the purpose of the study and report:
The document examines some of the ways in which the Web has impacted information seeking, and how new cloud-based, Webscale services are now at the center of many users’ educational and learning lives. This document contains views of library leaders and insights from trend watchers who write about the future of the Web.
Included are short essays that express the views of:
  • Leslie Crutchfield, author, speaker and leading authority on scaling social innovation and high-impact philanthropy
  • Thomas L. Friedman, reporter and columnist, and author of The World Is Flat and That Used to Be Us
  • Seth Godin, Internet marketing pioneer and author of We Are All Weird
  • Professor Ellen Hazelkorn, Vice President of Research and Enterprise, and Dean of the Graduate Research School, Dublin Institute of Technology (DIT), Ireland
  • Steven Berlin Johnson, author of Where Good Ideas Come From: The Natural History of Innovation
  • Kevin Kelly, cofounder and Senior Maverick of Wired magazine
  • James G. Neal, Vice President for Information Services and University Librarian at Columbia University
  • Findings from The European Commission on Information Society and Media (ERCIM) on the how cloud computing is impacting the Web
  • The OCLC Global Council on the challenges and opportunities facing libraries today and in 2016
We interviewed dozens of library leaders about the future of libraries and key challenges and opportunities they face today and will face in 2016. Their ideas and quotes are presented and distilled in the report and to provide specific thoughts as to the need for “radical cooperation” in library services. Librarians from a wide variety of library types, across a worldwide geography were consulted. Surprisingly, though, their top concerns and aspirations were often in agreement, regardless of library size, location and type.
Download the full report here.

Wednesday, December 07, 2011

BISG Policy Statement on ISBN Usage

The Book Industry Study Group after long deliberation and incredibly astute consulting has announced its policy recommendation for the use of ISBNs for digital products (Press Release):
This BISG Policy Statement on recommendations for identifying digital products is applicable to content intended for distribution to the general public in North America but could be applied elsewhere as well. The objective of this Policy Statement is to clarify best practices and outline responsibilities in the assignment of ISBNs to digital products in order to reduce both confusion in the market place, and the possibility of errors.

Some of the organizations which have indicated support of POL-1101 include:

  • BookNet Canada
  • National Information Standards Organization (NISO)
  • IBPA, the Independent Book Publishers Association
CLICK HERE to download
 Close readers of this blog will recall the work done by the identification committee of BISG:
In the spring of 2010, BISG's Identification Committee created a Working Group to research and gather data around the practice of assigning identifiers to digital content throughout the US supply chain. "The specific mandate of the Working Group was to gather a true picture of how the US book supply chain was handling ISBN assignments, and then formulate best practice recommendations based on this pragmatic understanding," said Angela Bole, BISG's Deputy Executive Director. "Around 60 unique individuals and 40 unique companies participated in the effort. It was a truly collaborative learning process."

Noted Phil Madans, Director of Publishing Standards and Practices for Hachette Book Group and Chair of the Committee in charge of developing the Policy Statement, "It was quite a challenge to bring some measure of consistency and clarity to what our research revealed to be so chaotic and confused that some even reported thinking ISBN assignment should be optional--a 'nice to have'. This, clearly, would not work."
The initial consulting report was discussed publicly about 12mths ago and I summarized that presentation in this post from January 17, 2011.

These were the summary conclusions from that presentation:
There is wide interpretation and varying implementations of the ISBN eBook standard; however, all participants agree a normalized approach supported by all key participants would create significant benefits and should be a goal of all parties.

Achieving that goal will require closer and more active communication among all concerned parties and potential changes in ISBN policies and procedures. Enforcement of any eventual agreed policy will require commitment from all parties; otherwise, no solution will be effective and, to that end, it would be practical to gain this commitment in advance of defining solutions.

Any activity will ultimately prove irrelevant if the larger question regarding the identification of electronic (book) content in an online-dominated supply chain (where traditional processes and procedures mutate, fracture and are replaced) is not addressed. In short, the current inconsistency in applying standards policy to the use of ISBNs will ultimately be subsumed as books lose structure, vendors proliferate and content is atomized.

Wednesday, September 28, 2011

McKinsey Report on BIG Data

Recently consulting services firm McKinsey released a report that took a look at what BIG data is and what the potential gains could be from the use of BIG data. The types of benefits that could be realized are quite stunning:
MGI studied big data in five domains—health care in the United States, the public sector in Europe, retail in the United States, and manufacturing and personal location data globally. Big data can generate value in each. For example, a retailer using big data to the full could increase its operating margin by more than 60 percent. Harnessing big data in the public sector has enormous potential, too. If US health care were to use big data creatively and effectively to drive efficiency and quality, the sector could create more than $300 billion in value every year. Two-thirds of that would be in the form of reducing US health care expenditure by about 8 percent. In the developed economies of Europe, government administrators could save more than €100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues. And users of services enabled by personal location data could capture $600 billion in consumer surplus. The research offers seven key insights.
The executive summary is located here (pdf); however in their press release they also summarized some of the findings:

1. Data have swept into every industry and business function and are now an important factor of production, alongside labor and capital. We estimate that, by 2009, nearly all sectors in the US economy had at least an average of 200 terabytes of stored data (twice the size of US retailer Wal-Mart's data warehouse in 1999) per company with more than 1,000 employees.

2. There are five broad ways in which using big data can create value. First, big data can unlock significant value by making information transparent and usable at much higher frequency. Second, as organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information on everything from product inventories to sick days, and therefore expose variability and boost performance. Leading companies are using data collection and analysis to conduct controlled experiments to make better management decisions; others are using data for basic low-frequency forecasting to high-frequency nowcasting to adjust their business levers just in time. Third, big data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services. Fourth, sophisticated analytics can substantially improve decision making. Finally, big data can be used to improve the development of the next generation of products and services. For instance, manufacturers are using data obtained from sensors embedded in products to create innovative after-sales service offerings such as proactive maintenance (preventive measures that take place before a failure occurs or is even noticed).

3. The use of big data will become a key basis of competition and growth for individual firms. From the standpoint of competitiveness and the potential capture of value, all companies need to take big data seriously. In most industries, established competitors and new entrants alike will leverage data-driven strategies to innovate, compete, and capture value from deep and up to real time information. Indeed, we found early examples of such use of data in every sector we examined.

4. The use of big data will underpin new waves of productivity growth and consumer surplus. For example, we estimate that a retailer using big data to the full has the potential to increase its operating margin by more than 60 percent. Big data offers considerable benefits to consumers as well as to companies and organizations. For instance, services enabled by personal location data can allow consumers to capture $600 billion in economic surplus.

5. While the use of big data will matter across sectors, some sectors are set for greater gains. We compared the historical productivity of sectors in the United States with the potential of these sectors to capture value from big data (using an index that combines several quantitative metrics), and found that the opportunities and challenges vary from sector to sector. The computer and electronic products and information sectors, as well as finance and insurance, and government, are poised to gain substantially from the use of big data.

6. There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

7. Several issues will have to be addressed to capture the full potential of big data. Policies related to privacy, security, intellectual property, and even liability will need to be addressed in a big data world. Organizations need not only to put the right talent and technology in place but also structure workflows and incentives to optimize the use of big data. Access to data is critical—companies will increasingly need to integrate information from multiple data sources, often from third parties, and the incentives have to be in place to enable this.
Read the executive summary (PDF - 924 KB)
Read the full report (PDF - 1.91 MB)
Download eBook as ePub for Apple iPad, Barnes & Noble Nook, Sony Reader and other devices
Download eBook for Amazon Kindle

Tuesday, August 23, 2011

BISG BookStats - Live Webcast

BISG has announced two webinars to discuss the recent release of BookStats:
During each Webcast, representatives from the BookStats data team will provide a comprehensive look into how BookStats was developed and what trends were discovered. The presentation will include analysis of several top line data points as well as a tour through the interactive BookStats Online Data Dashboard.


Wed, August 31, 2011
1:00 p.m. to 2:00 p.m.
Wed, September 7, 2011
1:00 p.m. to 2:00 p.m. Eastern

Spanning 2008-2010, BookStats offers data and analysis of the total industry and the individual Trade, K-12 School, Higher Education, Professional and Scholarly markets. Produced jointly by the Association of American Publishers (AAP) and the Book Industry Study Group (BISG), its highlights include:
  • Overall U.S. publishing revenues are growing
  • Overall U.S. publishing unit sales are up as well
  • Americans, young and old, are reading actively in all print and digital formats
  • Education publishing holds steady and, in some segments, shows solid growth Professional and Scholarly publishing shows gains

Monday, January 10, 2011

Findings Meeting for the Identification of e-Books and Digital Content Project

As you may know, in May 2010 the Identification of E-Books Working Group of BISG’s Identification Committee began a systematic review of the International ISBN Agency recommendations for the identification of e-books and digital content. As a result of this review, BISG hired me to "conduct an objective, research-based study that would describe, define and make recommendations for the best case identification of e-books in the U.S. supply chain."

During September and October, I conducted over 50 interviews with 70 industry personnel from across the spectrum of the publishing industry and subsequently reported my findings to the working group during November. After further internal discussions about the findings, BISG has scheduled a meeting on January 13th to discuss the findings with the wider BISG community. Participants may register here.

This was a challenging engagement given the complexity of the issue, the varying points of view and the compacted time required to complete the project; however, we believe by conducting this study we have established an unequivocal baseline that will allow the industry to address the core issues for content identification as we migrate from physical to digital products.

BISG expects results from the new Identification of E-Books Research Project will directly influence the consensus-driven process of developing best practices for the identification of e-books. This work will happen within ongoing meetings of BISG’s Identification Committee and is expected to result in recommendations to the International ISBN Agency for the further development of this important standard.

BISG and I look forward to your participation during the review of these findings and we hope you will encourage as many of your staff to attend this meeting as necessary.

Wednesday, November 10, 2010

Books about Presidents

These USA Today snapshots were the best and most effective marketing and PR we did at Bowker when I was there. We got more mileage out of these than anything else we did. At the time it was Andrew Grabois who did the stats and this time it is Roy Crego.

Thursday, November 04, 2010

Fake Reviews: Does Amazon have a sense of Humor?

Take a look at how numerous Amazon reviewers have taken the time to write reviews for a book that is either a test page (why Amazon would be doing that at this stage is questionable) or the metadata is someone's idea of a sick bibliographic joke. Since the record appears on other booksellers web pages it may be the latter. Anyway the reviews are funny and in case the page is removed here are images:

Friday, October 22, 2010

Repost: Publishing and Global Data Synchronization

Originally published on March 19, 2007


Over the past several months, I have commented on issues related to the publishing supply chain and the need to revamp the relationships between supply chain partners to create a more efficient business environment. Closer integration across the publishing supply chain will result in better efficiency and effectiveness leading to higher revenue and profitability. While sharing information between partners is growing in frequency, there has been only limited movement towards a more holistic approach to addressing supply chain issues.

The potential evolution of metadata requirements for the industry is also a theme I have addressed and I have suggested that base level bibliographic information has become a commodity. In recent years BISG has proactively reviewed the potential for a single “(Global) data synchronization (Network)” application for the publishing business. This GDSN application would sit as a central hub of base-level bibliographic information that would be accessible to all industry participants. Fees would be assessed to participants but would be fairly modest given the experience with similar applications in other industries. (In fact, some large publishers are already subscribers to GDSN data pools due to their business operations in non-retail markets such as big box retailing).

The management of a GDSN data pool is not an insignificant task and were BISG to sponsor the development of a publishing data pool it is logical to believe that a manager of this data-pool would be required. It is unlikely that BISG would want to manage this themselves and would be more likely to create an RFP for the application. A fully implemented GDSN solution for the publishing industry could significantly improve data flow and data accuracy across the supply chain not least because there would be one central location for read and write product details.

Having said that, there is an assumption that the internal data provided externally to the GDSN data pool would be accurate. Unfortunately I know this is not generally the case based on my experience at Bowker. Since the launch of Amazon and the subsequent adoption of ONIX, publisher data is significantly better than it was in the past but once you move away from all but the top 200-300 publishers data quality is a real problem. Businesses must continue to focus on product information and it is likely that service providers such as Netread will continue to play a large role in data supply.

The implementation of GDSN will lead to other integrated applications that will support further improvements in the supply chain. These include RFID, collaborative planning, forecasting and replenishment, vendor managed inventory and scan based trading. All of which have been implemented in other industries and in publishing would begin to mimic my suggested Intelligent Publishing Supply Network (IPSN). Among the benefits that other industries have proven is possible are,
  • Increased order accuracy
  • Ability to identify and correct internal data discrepancies
  • Decrease retailer out-of-stocks
  • Realize more timely replenishment orders
  • Reduce the likelihood of retailer returns
  • Establish the foundation for adopting RFID
RFID tagging could create the opportunity for significant gains in operational efficiency within the book world. The RFID tag carries only minimal data – the Electronic Product Code (EPC) - (sometimes referred to as the electronic barcode) and the tag can either be read only or read/write. Even in isolated (non-networked) applications within a book store or a distribution center the results in efficiency would be profound. Within a full RFID implementation across a supply chain, product level data is accessible from a central location (database) at any point in the supply chain providing details on what the product is and where it came from. In tandem, the location of that item is made available centrally so that tracking details can be analyzed and acted upon. Wide spread implementations will be limited until tag costs decrease and some operational issues, such as pallet level accuracy (some tags can’t be read if they are buried on a pallet), are addressed.

The future for an efficient book supply chain is playing out in other industries such as consumer products, hardware and the grocery chain. As Publisher revenues become more bifurcated across multiple distribution networks, making physical distribution more cost effective becomes critical, because less revenue will be generated from physical distribution reducing scale effects and squeezing profit margins. It remains for the industry via BISG to lead in the direction of a more efficient and effective supply chain.

Links:
Qualified Metadata
Supply Chain 1/2

Friday, September 17, 2010

Repost: 'Qualified Metadata' - What Does it All Mean?

Originally posted on 2/22/2007, I was speaking to someone this afternoon about this topic and it reminded me a little of this post.


Earlier this month I spoke about how data providers may be able to carve a place for themselves as the single provider of catalog information for particular industries. This data, representing 'base level' descriptive information (in the book world we call it bibliographic data) would be widely disseminated across the Internet to facilitate trade of products, materials and services and would be provided by one data supplier. Other data suppliers - one layer up if you will - would also make use of this base level information but add to it value added data elements which would be particularly important to segments of the supply chain. The most obvious example in books would be subject and categorization data which aids in discovery of the item described. Another set of data elements could reflect more descriptive information about a publisher over and above basic address and contact details. In the second of my series, I take a look at the library environment.

In a recent article in D-Lib (January 07), Karen Markey of the University of Michigan looks at how the library online catalog experience needs to change in order for users to receive more relevant and authoritative sources of information to support their research needs. She goes on to quote Deanna Marcum of Library of Congress "the detailed attention that we have paid to descriptive cataloguing may no longer be justified...retooled catalogers could give more time to authority control, subject analysis, [and] resource identification and evaluation." Markey proposes redesigning the library catalog to embrace three things:
  1. post-Boolean probabilistic searching to ensure the precision in online catalogs that contain full-text
  2. subject cataloguing that takes advantage of a users ability to recognize what they do and don't want
  3. qualification cataloguing to enable users to customize retrieval based on level of understanding or expertise
New search technologies such as MarkLogic, FAST and the search tool behind Worldcat offer some of these capabilities but are generally not accessible to the average user. For example, some of these tools enable flexibility in the relevant importance given to elements within a record; so manipulating the importance of Audience level in a WorldCat search would 'skew' the search result set to higher or lower comprehension titles based on the bias given to one or the other.

Perhaps the most compelling point Markey raises in her article supporting increased attention to "qualification metadata" is the 30 to 1 'rule'.
The evidence pertains to the 30-to-1 ratios that characterize access to stores of information (Dolby and Resnikoff, 1971). With respect to books, titles and subject headings are 1/30 the length of a table of contents, tables of contents are 1/30 the length of a back-of-the-book index, and the back-of-the-book index is 1/30 the length of a text. Similar 30 to 1 ratios are reported for the journal article, card catalog, and college class. "The persistence of these ratios suggests that they represent the end result of a shaking down process, in which, through experience, people became most comfortable when access to information is staged in 30-to-1 ratios" (Bates, 2003, 27). Recognizing the implications of the 30-to-1 rule, Atherton 1978) demonstrated the usefulness of an online catalog that filled the two 30-to-1 gaps between subject headings and full-length texts with tables of contents and
back-of-the-book indexes.
Once I read this it was obvious to me that we may not have thought through the implications of projects such as Google Print on retrieval. These initiatives will result in huge (big, big, big) increases in the amount of stuff researchers and students will have to wade through to find items that are even remotely relevant to what they are looking for. In the case of students, unless appropriate tools and descriptive data is made available we will only compound the 'its good enough' mentality and they will never see anything but Google Search as useful.

Markey's article is worth a read if you are interested in this type of stuff, but I think her view point is a starting point for any bibliographic agency or catalog operation in defining their strategy for the next ten years. Most bibliographers understand that base level data is a commodity. The only value a provider can supply here is consistency and one-stop shopping and the barriers to entry are lowered every day. I am of the view (see my first article on this subject) that the agency that can demonstrably deliver consistent data should do so as a loss leader in order to corner the market on base level data and then generate a (closed) market for value added and descriptive (qualification) metadata. There are indications that markets may be heading in this direction (Global Data Synchronization - which I will address next) with incumbent data providers reluctantly following.

Providing relevancy in search is a holy grail of sorts and descriptive data is key to this. In the library environment if the current level of resources were reallocated to building the deeper bibliographic information we need then the traffic in and out of library catalogs would be tremendous. If no one steps in to provide this needed descriptive data then the continuing explosion of resources would be irrelevant because no-one would be directed to the most relevant stuff. Serendipity would rule. The data would also prove valuable and important to the search providers (Google, etc.) because they also want to provide relevance; having libraries and the library community execute on this task would be somewhat ironic given the current decline in use of the online library catalog.

Friday, July 23, 2010

Repost - Data Sync: The Next Coming of Biblio Data - Repost

Originally posted June 22, 2007

A number of years ago while President of Bowker I attended a conference organized by our EDI provider General Electric (GXS) where they discussed the application of a budding industry product information process referred to as data synchronization.

In contrast to the publishing industry most industries do not have standard industry wide product catalogs. Books benefit in this respect from the universal acceptance of the ISBN and few if any industries have a standard numbering system that supports product databases like booksinprint, Nielsen bookdata, titlesource and IPage. Data synchronization represents an attempt to make common, up to date and harmonious, standardized item and location information between trading partners. In English, trading partners have access to the same ‘data pool’ of item information which is continuously up to date and enables harmonization between the data pool and the respective item databases at each trading partner. From the
GS1 site:
Global Data Sync Network is an automated standards based global environment that enables secure and continuous data synchronization which allows partners to have consistent item data across their systems all the time. It ensures that all the parties in the supply chain are working with the same data – allows for simplified change notification and saves time and money for all organizations by eliminating steps to correct inaccurate data.
While my participation at this particular meeting was pooh poohed by my boss at the time it had me worried. The BIP database is licensed to many entities in the publishing industry and if trading partners in the publishing business got together to exchange data a la data synchronization then our business could be in jeopardy. In recent years, a number of companies in the grocery, soft goods and hardware businesses have implemented data synchronization with substantial numbers of their trading partners. The process is complicated and certain standards and formats govern the implementation; however, benefits can be substantial including less re-keying of data, better in-stock positions, better marketing promotions and fill rates and many other benefits which are documented in the following presentations (1,2,3).

While I was worried about the impact the development of a publishing data pool could have on the Bowker business, the irony is that the BIP database is the ultimate data pool - the like of which doesn’t exist in any other industry. No doubt that is the thinking of
BookNet Canada which has embarked on a project that may ultimately result in the creation of a data pool for the North American publishing industry. BookNet Canada has the remit to improve the publishing supply chain in Canada, and Bowker (while I was President) helped them establish an industry EDI service and sales reporting tool. For data synch they are working with Comport Communications the only certified data pool provider in Canada. The successful implementation of data synch in Canada could become a (the) prototype of a subsequent larger implementation in the US and/or UK. Interestingly, the Canadian books in print database is a hybrid of US BIP and UK BIP and which BookNet Canada also look to develop. (Bowker has the current incumbent product).

The implications for BIP products are fundamental but not catastrophic (although I will leave it to them to figure out why) but the larger issue is the potential radical shift in the traditional use of book product information and the ensuing significant improvement in supply chain information. We are a few years off yet but the benefits will come none too soon.

Friday, June 18, 2010

Metadata Everywhere

An interesting article in OCLC's NextSpace publication about the increasing importance of meta data. Music to bibliographers and catalogers' ears. (OCLC):
“Metadata has become a stand-in for place.”

So says Richard Amelung, Associate Director at the Saint Louis University Law Library. When asked to expand on that idea he explains, “Law is almost entirely jurisdictional. You need to know where a decision occurred or a law was changed to understand if it has any relevance to your subject.

“In the old days, you would walk the stacks in the law library and look at the sections for U.S. law, international law, various state law publications, etc. Online? Without metadata, you may have no idea where something is from. Good cataloging isn’t just a ‘nice-to-have’ for legal reference online. It’s a requirement.”

Richard’s point is one example of a trend that is being felt across all aspects of information services, both on and off the Web: the increasing importance and ubiquity of metadata. In a world where more and more people, systems, places and even objects are digitally connected, the ability to differentiate “signal from noise” is fast becoming a core competency for many businesses and institutions.

Librarians—and catalogers more specifically—are deeply familiar with the role good metadata creation plays in any information system. As part of this revolution, industries are increasing the value they place on talents and the ways in which librarians work, extending the ever-growing sphere of interested players.

Whether we are tracing connections on LinkedIn, getting recommendations from Netflix, trying to find the right medical specialist in a particular city or monitoring a shipment online, metadata has become the structure on which we’re building information services. And no one has more experience with those structures than catalogers.

Concluding:

“It is clear that metadata is ubiquitous,” Jane continues. “Education, the arts, science, industry, government and the many humanistic, scientific and social pursuits that comprise our world have rallied to develop, implement and adhere to some form of metadata practice.

“What is important is that librarians are the experts in developing information standards, and we have the most sophisticated skills and experience in knowledge representation.”

Those skills are being put to good use not only in the library, but in nearly every discipline and societal sector coming into contact with information.

Bibliographers Shall Inherit...Data Monopolies - Repost

I recently heard Fred Wilson speak and it reminded me of this post from February 5th, 2007:


Fred Wilson is a founder of Union Square Ventures a private equity firm located in NYC. He was also part of Flatiron Partners until he left to start Union Square. He was the key note speaker at Monday’s SIIA Previews meeting and spoke about Content; specifically that content "wants to be free."

He ended the session with a potentially more interesting theme which related to tagging and content descriptions. In answer to a question about the potential power of social nets and the attendant tagging possibilities he suggested that we shouldn’t have to tag information at all; that is, content should be adequately described for us. The questioner stated that ‘publishers are good’ at describing their content. Wilson disagreed, confirming (to me) that publishers are definitely not good at tagging or classifying their content. His comments confirm for me a belief that intermediaries that insert descriptors, subject classifications and other metadata to improve relevance and discovery will play an increasingly important role. Personally, I do not think the battle has yet been joined that will determine one provider of standardized meta-data within specific product or content categories. (Some players have clear positioning, take for example Snap-On tools purchase of Proquest’s Business Solutions unit which opens many intriguing opportunities – if you like car parts).

You may think that books are effectively categorized by Amazon.com and therefore Amazon is the standard. This is untrue: In fact there are several bibliographic book databases and none of them are compatible across the industry. Additionally, while Amazon allows great access to their data, they are not a good cataloguer of bibliographic information. Their effort is enough to serve their purposes. As a seeker of books and book (e)content, I will want to be able to search on a variety of data elements (publisher, format, subject, author) and find what I am looking regardless of the tool I am using. In my view a single source of quality bibliographic information distributed at the element level will solve this problem. Suppliers of content are beginning to understand that it is the description of the content (metadata) that is as important as the content itself.

It is really quite simple: A database provider needs to spend time standardizing their deep bibliographic content, distribute it to anyone who wants it and then figure out how they can make money doing that. Historically, a vendor had to create their own product catalog because either one didn’t exist or they preferred to build it themselves. Look at office products or mattresses. It is nearly impossible to compare items across vendors. Books and other media products are slightly easier but the legacy of multiple databases continues to reduce efficiency. Management of a product database/catalog should never be a competitive advantage unless it is your business.

Fred Wilson stated that if information wants to be free then where is the value in information? Unsurprisingly it is in attention. To quote, "there is a scarcity of attention and narrowing users’ data ‘experience’ to mitigate irrelevance is the future." Furthermore the ‘leverage points’ in the attention driven information model are Discovery, Navigation, Trust – ratings around content (page rank is good example), Governance, Values and Metadata – data about the data. The likes of Google, Yahoo and Microsoft have the first couple of these items well in hand but they will all increasingly need good meta-data that describes the content they are serving up. This is where aggregators/intermediaries step in whether it be tools, tv programs and movies, advertising or books.

He has provided a link on his web site to the presentation from this meeting.

Friday, March 05, 2010

Is There a Future for Bibliographic Databases? - Repost

The following was originally posted on April 2nd, 2007 and since John and I met for dinner in NYC last night I thought I would re-post his article.


I have commented a number of times on what I view is the future of bibliographic databases - particularly those similar to Books in Print and Worldcat - and in keeping with that theme I asked John Dupuis (Confessions of a Science Librarian) what his views were on the same subject. The following article is written by John Dupuis, Science Librarian, Steacie Science & Engineering Library, York University. He told me to mention he is on sabbatical.


A week or so ago, Michael asked me to do a guest post here on Personanondata about bibliographic databases, based on some of the speculations I've made on my own blog, Confessions of a Science Librarian, about the future of Abstracting and Indexing databases.

Here's how he put it in his email:
I have read your posts on the future of information databases and bibliographies etc. over the past several months and I was wondering whether you had a specific opinion of the future of bibliographic databases such as worldcat and booksinprint? ... [O]n my blog I have skirted around the idea that the basic logic of these types of databases is beginning to erode as base level metadata is more readily available and of sufficient quality to reduce the need for these types of bibliographic databases. Assuming that is increasingly the case then these providers need to determine new value propositions for their customers. So what are they?
How could I resist? I'm not sure if I exactly answer his questions or even talked about what he'd hoped I'd talk about, but at least I've probably provoked a few more questions.

In my blog post on the future of A&I databases, I basically came to the conclusion that in the face of competition from Google Scholar and its ilk, the traditional Abstracting & Indexing databases would be increasingly hard-pressed to make a case for their usefulness to academic institutions. Students want ease of use, they concentrate on what's "good enough" not what's perfect. Over time, academic libraries will find it harder and harder to justify spending loads of money on search and discovery tools when plenty of free alternatives exist. Unless, of course, the vendors can find some way to add enough value to the data to make themselves indispensable. I used SciFinder Scholar as an example of a tool that adds a lot of value to data. I think we'll definitely start to see this transition from fee to free in the next 10 years, with considerable acceleration after that.

Now, I didn't really talk about bibliographic/collections tools like Books in Print (BiP), WorldCat (WC), Ulrich's or the Serials Directory (SD). Why not? I think it's because those tools are aimed at experts, not end users. Professionals, not civilians. Surely if a freshman only wants a couple of quick articles to quote for a paper due in a couple of hours, then we librarians and publishing professionals are looking for good, solid, quality information and we're willing to pay for it. This distinction would seem to me to be quite important, leading to quite a different kind of analysis, one I wasn't really aiming at originally. So, I didn't really think about it at the time.

So, now it's time to put the thinking cap back on and see what my crystal ball tells me.

In my professional work as a collections librarian, I am a frequent user of all the tools I mention above. I think that BiP is the one I use the most. Over the last 5 or 6 years I've built up a specialized engineering collection mostly from scratch so I've needed a lot of help and BiP has been an enormously useful tool. I use keyword searches. I also use the subject links on the item records a lot to take me to lists of similar books.

WC I use less frequently, mostly only when I want to look beyond books that are in print and want to identify older and rarer items that I'll end up having to get on the used book market. I've used this to build up various aspects of our Science and Technology Studies collection on topics like women in science. On the other hand, WC seems to have already found a big part of its value proposition with non-experts. Look at it's partnership with Google Book Search. Also look at the really innovative things it's doing with products like WorldCat Identities. It's not perfect by any means but you can see the innovative spirit working.

Ulrich's and SD I mostly use to identify pricing issues for journals I might want to subscribe to, so I don't use them that often. With the ease of finding journal homepages, this function is probably falling fast in it's uses. As for identifying the journals in a particular subject area, that's still a useful function but I wonder what the future is if that's all they offer.

For our purposes here, I'll concentrate on the one I use most: BiP. I presume a lot of what I have to say will also more or less apply to the other specialized tools aimed at pros.

So, I definitely need quality information on books to do my job, now and in the future. But if I need quality information, what will the source be? Although of course I use BiP, I also use Amazon quite a lot to find information on books I want to order; the features that they have that I like best and use most come out of the kind of data mining they can do with their ordering and access logs. When I'm looking at an interesting item, Amazon can quickly tell me what other books are similar, what other books people that have purchased the one I'm looking at have also purchased. I find this to be an extremely important tool for finding books, a great time saver and an incredibly accurate way of finding relevant items. Also, when I search Amazon, I'm actually searching the full text of a lot of books in their database. This feature gets me inside books and unleashes their contents in a way that can't be duplicated by being able to view or even search tables of contents.

I also very much like the user-generated lists and reviews. On more than one occasion I've appreciated multiple user reviews of highly technical books, especially when there are negative reviews to warn me away from bad ones. The "Listmania" and "So you'd like to.." lists are great sources of recommendations. On the other hand, it has some significant problems that keep me from going to it exclusively. For example, most any search returns reams of irrelevant hits. The subject classifications that Amazon displays at the bottom of the page I also find next to useless as they are often far too broad.

For BiP, the features I appreciate the most, the ones that draw me back from Amazon, include very good linkable subject classification and good coverage of non-US imprints. When I do keyword searches, the results seem more focused and less cluttered with irrelevant items. I also like that it gives me very complete bibliographic information, including at least part of a call number. While Amazon isn't geared to let you mark then print out a bunch of items (why would they want you to be able to do this?), I appreciate being able to generate lists and print them out using BiP. On the other hand, BiP has been slow to make their interface as quick and easy to use as Google or Amazon, to make use of the tons of data they have, to mine it to find connections, to harness user input and reviews in a massive way to compete with the Amazon juggernaut. When for-fee is competing with for-free, the one that costs money has to be very clearly the best.

Another threat to BiP is Google Book Search. As I've recounted in a story on my blog, Google Book Search in an incredible tool for research, reference and even collections. Once again, the ability to search the entire text of books is an incredible tool for revealing what they're really about, to surface them and make me want to buy them. As Cory Doctorow has said, the greatest enemy of authors (and publishers) is not piracy, it's obscurity. Google Book Search is an amazing tool for a book to get known and,ultimately, to get bought. As more and more publishers realize this (and even book publishers are smart enough to realize this eventually), they'll make darn sure all their new books are full text searchable by Google (and, presumably, Amazon and others). How can BiP compete with that?

I think it's safe to say, it wouldn't take much for me to completely abandon the use of BiP and only use free tools such as Amazon and Google. What could BiP do to keep in the game? What is their value proposition for me? What is the value proposition for all bibliographic tools hoping to market themselves to library professionals now and in the future?

Some issues I've been thinking about.
  • The changing nature of publishing What's a book? What's a journal? What does "in print" mean? Print journals vs. online? Ebooks vs. paper books? Fee vs. Free. Open Access publishing. Wikis. Blogs. To say that bibliographic databases have to be ahead of the curve on all the revolutionary changes going on today in publishing is an understatement. Look at all the trouble newspapers are in, the trouble they're having adjusting to a new business model. Well, the book world is changing as well, especially for academic customers. The needs of academic users are quite different from regular users. They don't necessarily need to read an entire book, just key sections. Search and discovery are incredibly important to these users, almost more important than the content. They also really don't care about the source of their content, what they really care about is having as few barriers between the content and themselves. How will BiP and other bibliographic databases help professionals like me navigate this mess? Easy. By continuing to provide one-stop-shopping, only for a much wider range of items. Paper books from traditional publishers, for sure, but how about all those Print on Demand publishers? Sifting through the chaff to get the rare kernel of wheat is an important task, one I know that they're already doing to some degree. But how about digital document publishers like Morgan & Claypool? O'Reilly's Digital PDFs? White papers and other documents from all kinds of publishers? How about the incredible amount of free ebooks out there? And other useful digital documents and document collections, both free and for sale (The Einstein Archives is an example)? And breaking down the digital availability of the component parts of collections like Knovel, Safari, Books 24x7 and all the others. Any tool that could help me evaluate the pros and cons of those repositories would be greatly appreciated. The landscape out there for useful information is clearly far larger than it used to be.
  • Changing nature of metadata. Never underestimate the value of good metadata; never underestimate the value of the people that produce that metadata. It seems to me that one of the core issues is who should create metadata for books and other documents and how should that metadata be distributed to the people that want it, be it commercial search engines or library/bookstore catalogues. It would be great if all content publishers created their own metadata and that it was of the highest quality and free to everyone. There's a role for bibliographic databases to collect and distribute that metadata, maybe even to create it. The library world has a good history of sharing that kind of data, but I'm not sure how that model scales to a bigger world. It seems to me that there's an opportunity here.
  • Changing nature of customers. I've publicly predicted that I will hardly be buying any more print books for my library in 10 years. Libraries are changing, bookstores are changing. Our patrons and customers are the ones driving this change. As my patrons want more digital content, as they use print collections less, as they rely on free search and discovery tools rather than expensive specialized tools, I must change too. As my patrons' needs and habits change, the nature of the collections I will acquire for them will follow those changes -- or I will find myself in big trouble. Anybody that can make my life easier is certainly going to be welcome. And that will be the challenge for the various bibliographic tools -- making it easier for me to respond to the changes sweeping my world. A good bibliographic service should be able to help me populate the catalogue with the stuff I want and my patrons need. I think a lot of progress has been made on this front in products like WC, but I think to stay in the game the progress will have to be transformative. There's lots of opportunity here.
  • What's worth paying for. In other words, BiP, WC and their ilk have to be better than the free alternatives. And not just a little better. And not just better in an abstruse, theoretical way; if it takes you 20 minutes to explain why you're better, the margin may be too slim. Better as in way better on 80% of my usage rather than just somewhat better than on 20%. Better as in saving time, saving effort, saving more money than they cost, making my life easier.
To conclude, I can only say one thing. In times of intense change and uncertainty, evolutionary pressure is extremely intense. Only those products and services that can find an ecological niche, a way to satisfy enough customers, will survive. To thrive is another story. To thrive requires a redefinition of products and services, a way to jump ahead of competitors and to win new markets with something new and exciting. It's hard to tell where bibliographic databases will find their place: will they be dodo birds, or will they find a way to survive or even thrive in the coming decade. There's certainly a window to change. Nobody is going to cancel any of these core tools any time soon. But the window will close sooner rather than later.

John can be reached at the following email address: dupuisj@gmail.com