Thursday, June 30, 2011

3M's eBook Cloud Library Didn't Come Out of Nowhere!

When the Douglas County Libraries in Colorado installed self check-in stations a while ago, they realized that hey had an opportunity to restructure their space. The circulation desk that dominated the main entrance was no longer needed. It seemed obvious to Library Director Jamie LaRue what to put in its place. Libraries need to greet their visitors with displays of books available for immediate checkout. 80% of Douglas County's adult circulation is generated by visual displays of books, so the best way to entice visitors to read is to show them great books to read.

When Douglas County began investigating how to put ebooks into county resident's computers, they wanted to do something similar. A user looking for ebooks should be greeted with a virtual bookshelf of books waiting to be checked out. LaRue was not satisfied with the offering of industry leader Overdrive because he couldn't do such a simple thing.

Public libraries that offer ebooks are frequently faced with problems posed by the strong demand for ebooks. Their users are frequently disappointed that the ebooks they want are always checked out. Overdrive has not yet implemented an programming interface that would allow library catalogs to check on an ebook's availability before showing it to a user, so the process of finding an available ebook can involve a lot of tedious clicks.

To address these needs, Overdrive has announced the "Overdrive WIN" service, which will address better integration with library automation software along with a host of other improvements and service innovations.

I spoke with a number of library automation vendors at this past weekend's American Library Association meeting in New Orleans. eBook integration is high on the list of their customers' wish lists, but I couldn't find any that could tell me when they would be implementing better Overdrive integration, though many of them were in "discussions".

A new vendor worth mentioning was Toronto-based BiblioCommons, whose EC2-cloud-based OPAC service has been implemented by Seattle Public Library and is in beta with New York Public Library. I'd been hearing about BiblioCommons for long enough that I'd had my doubts as their reality. At ALA, they demoed a clean, modern web interface with plenty of social features- go take a look at Seattle Public. Given NYPL's status as a prominent Overdrive customer and Bibliocommons' actively developing codebase, I had hoped to see some preview glimpses of Overdrive WIN in BiblioCommons, but had no such luck.

Back in Douglas County, Jamie LaRue wasn't satisfied with the available options, so around the end of 2010, he had his team approach their auto-check-in vendor, 3M, to see if they could do something about ebooks. As luck would have it, they could. And they did.

Although 3M's entrance into the library ebook platform business came as a complete surprise to many in libraries and publishing, it seems obvious in retrospect. 3M's RFID tag, self-checkout/checkin, and detection businesses were already integrated with library automation systems, so much of the code needed to integrate to library systems was already written. 3M licensed ebook reader and DRM systems from Adobe, and in the space of six months, with the advice and help of customers such as Douglas County, was able to assemble a strong set of services it is branding as the "3M Cloud Library". These include reader software for iOS and Android, as well as spiffy "3M Discovery Terminals", electronic kiosks "with an intuitive touch-based interface". (pictured) 3M is even going to sell "white-label" eReader devices with software tweaked to meet the needs of libraries that want to lend devices.

While 3M is arguably breaking new ground in integration of ebooks with library systems, 3M is far behind Overdrive in the area of publisher relations, which can't just be switched on in a mere 6 months. Overdrive has announced expansions of its offerings in the school and academic markets. Meanwhile, 3M is going in publishers' back doors as it helps the State of Kansas withdraw from an awkwardly drafted Overdrive contract, which Kansas says allows them to move purchased content from Overdrive to other platforms. It's in publishers' interests to have a library ebook channel that competes with Overdrive, but they do SO like to be asked permission first.

For his part, LaRue just wants to be able to tailor his library service to the needs of his community. "I want to provide a quality, integrated experience with a local focus" is what he told me. That doesn't seem to be asking so much.

Update 6/30/11: At The Digital Reader, Nate Hoffelder reported in May that a lot of 3M's reading platform was sourced from txtr, a German start-up they'd invested in. I wasn't able to confirm this at ALA, but have since done so. The Adobe DRM implementation, reading software, apps, presentation interfaces all originated in txtr. I'm also told by multiple sources that 3M has been talking to publishers since at least December 2010.
Enhanced by Zemanta

Monday, June 27, 2011

Four Times Around the Library World

The New Orleans Convention Center is sandwiched between the warehouse district and some railroad tracks, and as a result, it's a kilometer long, end to end. This past weekend, it has hosted the American Library Association Annual Meeting. I've walked the length of the convention center about 10 times over the past 4 days. It's another kilometer from my hotel to it's near end, so add another 8 km to my total. Bourbon Street is 1.3 km down and back; so add 3 km there. There were 2.5 km of exhibits on the show floor at ALA; I make it a point to look at every one, at least briefly. So my ALA pedometer racked up about 25 km (over 15 miles, for the metrically challenged).

It's not over yet, but the ALA conference twitter feed says this week's attendance is over 20,000, including exhibitors. (Update: the final totals are 14,969 attendees and 5,217 exhibitors.) Their mileage may vary, but my estimate is that on average, an ALA attendee walked about 5 miles in total. So the grand total of walking at ALA should be about 160,000 km. That's 4 times the circumference of the earth.

All that walking is good for us. I replenished many of those calories at Cochon, where I hosted some lunches to tell librarians about Gluejar. But the Buttermilk Pecan Tart I had on Friday was worth the whole trip to New Orleans. The pleasure capital of Louisiana has moved a mile south as far as I'm concerned!

Cochon brought back memories of 5 years ago, when ALA was the first big convention to come to ALA after Hurricane Katrina. Cochon had opened just a week before, and I raved to friends after having oven-roasted oysters there. By the end of ALA 2006, the place was packed.

The meeting five years ago was a special one; the city was far from having being repaired or rebuilt, and many workers had been bussed in and bunked in temporary housing just so we could come. Everyone was just so happy to see us, it brings tears to my eyes just thinking about it. In New Orleans they still remember the weekend that librarians brought the city of New Orleans back to life.

Wednesday, June 22, 2011

EPUB 3 Beefs Up Metadata, but Omits Semantic Enrichment

Ironic amusement fills me when I hear book industry people say things like "metadata has become cool", or "context is everything". Welcome to the 20th century and all that. Meanwhile, in the library industry, metadata has been cool long enough to coat everything with a thick rind of freezer burn.

There's good news and notsogood news for ebook metadata. The revision to the EPUB standard, published just a month ago, includes metadata tools that could eventually lead to a new era of metadata cooperation between publishers and the entire book supply chain, including libraries. At the same time, the revision fails to take advantage of ready-made vehicles for semantic enrichment of content, a move that could still provide new types of revenue for publishers while giving libraries new opportunities to remain relevant as books become digital.

Since I'm incurably optimistic, I'll start with the half-full glass: Publication-level metadata. EPUB 3 includes a whole bunch of ways to include publication-level metadata in an EPUB container. As an example, imagine an EPUB3 for "Emma" with this mark-up in its package document (essentially the navigation directory for the book):
<metadata>
...
<meta property="dcterms:identifier"
id="pub-id">urn:uuid:A1B0D67E-2E81-4DF5-9E67-A64CBE366809</meta>
<link rel="marc21xml-record" href="http://www.archive.org/download/cihm_29722/cihm_29722_marc.xml" />
<link rel="marc21xml-record"
href="/cihm_29722_marc.xml" />
<link rel="foaf:homepage" href="http://openlibrary.org/books/OL24234129M/Emma" />
...
</metadata>

In this example, the first link element points to a MARC 21 xml record (MARC 21 is a blattarian standard for library metadata (look it up)) at the Internet Archive. The second link element points to the same record included in the EPUB container itself. There is also built-in vocabulary that allows the link element to point to ONIX, MODS, and XMP metadata records.

The example also shows that other vocabularies (such as FOAF) can be added for use in metadata elements. So, if you're a believer in RDA, you can put that in an EPUB file as well.

The meta element can also be used in the EPUB package document's metadata block. It's defined quite differently from HTML5's empty meta element, with an about attribute and allowed text content. In principle, it can be used to encode arbitrary RDF triples, thanks to a prefix extension mechanism borrowed from RDFa which allows EPUB authors to add vocabularies to their documents.

These capabilities, on their own, could support major changes in the way that books are produced, delivered and accessed. In a publisher workflow, the EPUB file could serve as the carrier for all the components and versions of a book, even bits that today might be left out or lost in the caverns of so-called "content management systems". A distributor would no longer need to match up content files with records in a separate metadata feed. EPUB books for libraries could be preloaded with cataloging and enrichment data, greatly simplifying the process of making the ebooks accessible in libraries.

Given the great advances for "package-level" metadata, it's a bit disappointing that semantic mark-up of content documents missed the EPUB 3 boat. The story is a bit complicated, and it's far from over. Imagine that you want to add mark-up to a book's citations- perhaps you want to embed identifiers to support library linking systems. Or perhaps you're a medical publisher and you want to embed machine readable statements about drugs and diseases in a pharmaceutical textbook. Or perhaps you want to publish a travel guide and you want search engines to pick out the places you're describing. These applications are not really supported by the current version of EPUB 3.

EPUB content documents have a feature that you might think would do the trick, but doesn't really. The epub:type attribute supports "semantic inflection" of elements. This attribute can be used to mark a paragraph as a bibliographic citation, for example, and supports many of the requirements imposed by conversion of content from legacy or specialized formats into the HTML5 dialect used by EPUB. It's an important feature, but not enough to support semantic enrichment.

Part of the problem is EPUB 3's dependence on HTML5, which is not yet a stable spec and is enmeshed in some surprisingly raw W3C politics. W3C has been the home of HTML standards development since the very early stages of the web, and has also been the home of semantic web standards development. HTML5 started outside of W3C in the WHATWG, an initiative to develop HTML in a way that would be backwards compatible with good-old fashioned non-XML HTML. W3C was convinced to fold WHATWG into its development efforts because of WHATWG's corporate backing. Even so, the WHATWG version of the HTML5 spec drips with sarcasm towards W3C HTML Working Group decisions.

During part of the development of EPUB 3, the HTML5 draft included "Microdata", a method of embedding semantic mark-up in HTML. RDFa, a standard that competes with Microdata, was developed by W3C channels, and within W3C, it was decided in February of 2010 to move Microdata out of the HTML spec so as to give it equal footing with RDFa. Some participants in the EPUB working group wanted to include RDFa in the standard; others thought this would impose too much of a complexity burden on publisher-implementers. The EPUB draft ended up being released without either RDFa or Microdata.

The recent endorsement of Microdata by the Google-Yahoo-Bing cooperation has changed the competitive landscape for embedded semantics. It's now apparent that Microdata will get priority implementation in HTML development tools, leaving RDFa as a niche technology. For most use cases of EPUB semantic markup, the differences between RDFa and Microdata are small compared to the advantages of piggybacking on the technology investment supporting website creation.

According to members of the EPUB working group, it is expected that a dot release will follow relatively quickly behind EPUB 3.0. It seems to me that picking a semantic markup technology for content documents should now not be so hard. If you work for a publishing company that has ever mentioned semantic markup in a product plan, you should probably be making sure that the EPUB working group is aware of your needs. If you are a librarian who can imagine the possibilities of a semantically enriched EPUB collection, you should similarly be making your concerns known.

Although the EPUB working group includes representatives from tools vendors that might conceivably benefit from the adoption of EPUB-only constructs, the group's track record for adopting wider web standards has been very encouraging. By adopting HTML5 as a stack component, the group has ensured that cheap or free tools to produce and author EPUB 3 content will be readily available.

Once semantic enrichment of ebooks becomes routine, libraries will play a vital role in their use. Libraries provide a copyright-friendly DRM-free community commons in which users can access and build on the information contained in licensed content. (Of course, I see "unglued" books as playing an equally important role in the library commons.)

The EPUB metadata glass is half full, and there's more wine in the bottle!

Note: This is one thing I'll be talking about on Saturday at the American Library Association meeting in New Orleans. (The program is somewhat inaccurate; the program will end at 10:30 AM at the latest. Ross Singer from Talis will lead off with an overview of semantic web technologies in libraries; I'll follow with discussions of RDFa, the Facebook "Like" button and of course, EPUB.
Enhanced by Zemanta

Friday, June 17, 2011

We Need a Name for Our Ungluing Books Service

In 1996, the company I worked for was briefly named "Company B". The day a new name for the company was to be unveiled was really quite exciting. The company that had been AT&T was being split into three pieces. The "A" piece, was to keep the AT&T name and the long-distance phone service. The "C" piece was NCR, which had been acquired a few years before for reasons no one really understood. The "B" piece was going to sell telecommunications equipment and would be centered around our beloved Bell Labs. That's what the "B" stood for.

Nobody was excited by the name "Lucent Technologies" when it was announced. Millions had been spent on a brand consultancy, and more had been spent trying to get Judge Greene to allow the use of "Bell" in the name. And none of us could comprehend that with all that money, they hadn't even bothered to secure the "lucent.com" domain name.
Dilbert.com

Our management struggled to convince us of the benefits of a meaningless name. They told us the name was an "empty vessel" which meant it would only acquire the meaning that we put into it with our "Bell Labs Innovations". The name had been field tested with focus groups, and nobody hated it. The domain name was acquired.

In late 2005, I needed a name for my company, because I was selling the business along with the name to a non-profit. The company itself was to become an empty shell that would hold money until I had something to do with it. I asked my son for name suggestions and we came up with "Gluejar".

Last year, as Gluejar began serious work on a new business, I back-formed the term "ungluing ebooks" from the company name. I didn't really expect to still be using the term eight months later, but its "empty-vessel" quality has proved to be useful. People have no idea what it means to "unglue" an ebook, so we have the opportunity to fill the word with meaning. The downside, of course, is that we have to explain what it means. "Crowd-funding the relicensing of ebooks with Creative Commons" is a mouthful, and most people don't understand that either!

The language surrounding ebooks can be tricky. "Free" is an immediate turn-off for publishers and authors who want to earn a living from their books; "unlock" suggests breaking DRM, "liberate" suggests that the books were in prison. So we keep on using "unglue".

We've brainstormed a bit about a name for the ungluing-ebooks website we're building. Our working name for the site is "BookPatrons.org". It's a name that  describes with reasonable accuracy the activity that we want to occur on the site, and it has some library flavor to it. From Wikipedia:
Patronage is the support, encouragement, privilege, or financial aid that an organization or individual bestows to another. In the history of art, arts patronage refers to the support that kings or popes have provided to musicians, painters, and sculptors.
"BookPatrons" is a bit boring, though. Will ordinary people want to think of themselves as "patrons" of books? Does the "pater" root make it seem a bit male? (Will we get competition from "BookMatrons.org"?)

Maybe we should "stick" to Gluejar or some other "unglue" related name. What do you think? This is your chance to be consulted!




If you think that "ungluing books" or "BookPatrons" are really stupid names, please say so now, in the comments. If you think they work well, tell us that too. You can contact us privately, too, with your great ideas.

It can be fun to come up with really awful names, too. My worst effort: "Biblerty.com".

Wednesday, June 8, 2011

Our Metadata Overlords and That Microdata Thingy

On June 2, our Metadata Overlords spoke. They told us that they'll only listen when we tell them things using a specialized vocabulary they've now given us at the schema.org website. Although we can still use our stone tablets if that's what we're using now, we're expected to migrate to a new Microdata Thingy, assuming that we really want them to pay attention to our website metadata supplications.

There are among us believers, who, led by druids enraptured by the power of stone tablets to carry truth, will shun the new thingy, but most of us will meekly comply with the edicts of the overlords. We're not able to distinguish the druidic language of the tablets from the new liturgy of of the state church. Many things are difficult to articulate in the new vocabulary, but gosh, those tablets were heavy to carry around. And the new thingy doesn't seem so awful, although it's difficult to tell with the mumbled sermons and hymn singing and all.

I hope the overlords don't try to take our pagan rituals of Friending and Liking away from us, though. The incantations used to invoke and bless the Like ritual also use the druidic language, and the help scrolls tell us we might confuse the overlords if we use more than one language in our prayers.

My soul remains troubled, however, at the thought that the Overlords care not for truth and for justice. Sometimes it seems as though the overlords want only for our offerings of attention and seek only to feed our lust for food, drink, entertainment, debauchery and money. Yes, there are new words for our books and learning, but we can say so little about these in schema.org language that our wizards and mages will be mute if they ever choose to enter that realm.

I myself was present at a conclave of such mages and wizards dedicated to the entwinement of data from libraries, museums and archives in full openness. When tweet of the new order came, we endeavored to learn more of schema.org and its thingy. We questioned whether the thingy was an abomination against openness, or whether we might exploit its Overlord endorsement to make our own spells more powerful. We agreed to teach each other our new thingy spells, even as our colleagues elsewhere figured out how to chisel the new vocabulary into stone. Word came from other lands that the new vessel would founder trying to cross the seas.

We then visited the temple of the archive and found the servers cool to the touch. We heard words from a past oracle, ate as they never ate in Rome, drank cool drafts, and returned home emboldened with an enlarged appreciation of intermingled bits.

So it was said, so shall we do.

Notes:
  1. Google's blog post on adopting microdata was signed by R. V. Guha who had a bit to do with the creation of RDF.
  2. It's not really a surprise that Google doesn't care about RDFa. In my article on RDFa from 2009, I pointed to mistakes that Google made in their RDFa documentation. They never fixed it.
  3. Schema.org can't even list all of its schemata- the web page, chock full of non-breaking spaces, is truncated!.
  4. The current microdata spec is in an odd state where it's confused about how to define an itemtype. In fact, the mechanism for defining new itemtypes is gone! Here's what it says:
    The item type must be a type defined in an applicable specification.

    Except if otherwise specified by that specification, the URL given as the item type should not be automatically dereferenced.

    A specification could define that its item type can be derefenced to provide the user with help information, for example. In fact, vocabulary authors are encouraged to provide useful information at the given URL.
    Apparently, stuff was removed for some sort of political reason- it's there in the WHAT-WG version; note that Google links to the W3C version, which is not fully baked.
  5. the Schema.org terms of service are creepy when you get to the part about patents.
  6. The big selling point for RDFa was that Google, Yahoo and Bing supported it for Rich Snippets and the like. But Microdata's inability to easily support complex markup turned out to be an key feature for the search engines. The moral of the story for standards developers: your best customers are always righter than the others.
  7. In the video, Brewster Kahle reads from the last page of A Manual on Methods of Reproducing Research Material by Robert C. Binkley (1936). OCLC Number 14753642. Peter Binkley, a meeting participant, donated a copy of his grandfather's book to the Internet Archive, along with permission to make it free to the public.
  8. Henri Sivonen has written a very readable and informed discussion about Microdata, RDFa, Schema.org and the process of making standards that you should read if you are interested in why things are the way they are in HTML5.

Thursday, June 2, 2011

EPUB Really IS a Container

"It's OK for libraries to put things in their EPUB books." That's what Bill Kasdorf, a member of the EPUB Working Group, told me last week at the IDPF Digital Book 2011 Meeting. He checked with EPUB Revision Co-Editor Markus Gylling to make sure. I had been curious if libraries could put all their cataloging information inside an EPUB file instead of siloing it in their catalog system.

It may seem an odd question if you don't know a few things about EPUB. EPUB is a standard format for ebooks. It's used by Apple, Barnes and Noble, Kobo, Overdrive and many others not named Amazon. EPUB is near the end of a revision process that will result in EPUB 3.0.

The EPUB specs define a lot more than just a file format. Both EPUB 2 and EPUB 3 define a container format (in EPUB 3 it's called the EPUB Open Container Format (OCF) 3.0, and then go on to define a number of file formats for files that go inside this container. These files are the resources- texts, graphics, etc. that make up the ebook.

OCF uses the ubiquitous ZIP format to wrap up all a book's resource files into a neat, transportable package. That's pretty much standard these days. Java ".jar" and ".war" files use the same mechanism, as do MacOS' ".app" files.  As a consequence, you can use any unzip utility to look inside an EPUB file and manipulate its contents.

There's even a reserved name for a file to contain book level metadata in OCF: META-INF/metadata.xml, as well as another file for rights information, META-INF/rights.xml. Another file, META-INF/signatures.xml can be used to prove who made parts of the file and determine whether anyone has mucked with them. When Gluejar issues Creative Commons editions of newly relicensed works, we'll use the rights.xml file to make sure the CC declaration is explicit.

The new EPUB revision is coming fast. Last Monday, Bill McCoy, Executive Director of the International Digital Publishing Forum (IDPF) announced the release of the full EPUB 3 proposed specification. My guess is that when we look back on this event 10 years hence, we'll recognize this as the moment EPUB began to revolutionize the world of information, and with it, the book industry.

Although Amazon still uses the aging MOBI format on its kindle devices, it seems only a matter of time before the infrastructure accumulating behind EPUB pushes them into the embrace of the IDPF. Already, most of the content flowing into the Amazon system is being produced in EPUB and converted to MOBI. Don't expect this shift to happen soon though; in his IDPF presentation, Joshua Tallent of eBook Architects described rumors that this would happen soon as "bunk"- but it will happen sometime.

EPUB 3 comes with lots of goodies. The revision adds several modules of sorely needed capability. It includes MathML, SVG and JavaScript over a substrate of HTML5 and CSS2.1. While MathML and SVG are essential for education and technical markets, JavaScript has been somewhat controversial because of the difficulty of making sure things work securely and without connections. Most of the reading systems inherit javascript capability from the WebKit rendering engine they're based on, so a lot of javascript functionality will work in ebook readers regardless.

(left) Autography Founder and Author  T. J. Waters
All this capability will remain latent unless people find compelling uses for it. I'm not worried. As the BookExpo itself got started, I met two different companies who were manipulating ebook files to solve the same problem: how can an author sign a book when the book is digital? Both companies, Autography and InScribed Media, create personalized experiences that leave artifacts of an author-consumer interaction inside ebook container files. Both of these companies have compelling solutions; they differ in their business models. Autography is structured as an author focused bookstore; InScribed is developing partnerships with existing bookstores.

InScribed Media Founder and Author Alivia Tagliaferri
To some extent, InScribed and Autography are forced to be a bit convoluted in the way they deliver their product because they need to live inside DRM green zones; users don't have access to the files inside books without cracking the DRM (which is rather easy, by the way!). It's unfortunate, because personalization of ebooks could be a good way to encourage responsible use. I certainly don't want that picture of me torrenting around the world!

Libraries face a similar dilemma. The insides of an EPUB file could be greatly enriched by  libraries, which have every motivation to enhance discovery both of the book and the information inside of it. But DRM gives the publisher and its delivery agents the exclusive ability to build context inside ebook containers. Libraries and readers are locked out. I think that for DRM systems to survive they will need to accommodate a more diverse set of user manipulations; author signatures are just the tip of the iceberg.

Coming soon, I'll report on EPUB 3 metadata.
Enhanced by Zemanta