Showing posts with label Hackathon. Show all posts
Showing posts with label Hackathon. Show all posts

Friday, January 17, 2014

EPUB has a Steep Road Ahead (Notes from Open Book Hack 2014)

People I talk to about ebook technology belong to one of two camps.
  1. EPUB3 is the future of ebooks. 
  2. EPUB3 has lost the ebook format war because no one is supporting it. 

Here I'm helping Fortitude fix a bug.
(photo by Ray Schwartz)
So it was very enlightening to join a group of developers at New York Public Library last weekend at Open Book Hack 2014. Sponsored by NYPL Labs and the Readium Foundation, the event was convened to examine the challenges of the "Open Book" on the Web. Since I had written that another publishing hackathon "pretty much ignored ebooks", I felt compelled to attend, and write a report.

Open Book Hack was the first event I've been to where developers were actually working on EPUB, the format that is emerging as the underpinning of the digital book industry. This was both encouraging and discouraging. Encouraging because you could start to glimpse possibilities that EPUB will enable, and discouraging because the road ahead looks so steep.

Open Book Hack attracted a very stimulating mix of developers from around the world (who were in New York for Digital Book World) and local developers looking for interesting problems. Notably, there was a very impressive contingent of students from New York's Flatiron School. The projects are listed on a github wiki.

I was very eager to meet the folks from Berkeley's iSchool that have been working on epub.js, a javascript package that lets you read EPUBs from the web in your browser. One of them, Jake Hartnell, is also on the team building Hypothes.is, a tool designed to let everyone annotate the web (and one of the sponsors of the event). (The others were AJ and Fred.) Sooner or later you'll see these tools added to Unglue.it. But there's still a lot of work left to do.

For example, the project at Open Book Hack with the highest ratio of usefulness to impressive-sounding-ness was the project to add scrolling to epub.js (check it out, it works!). That's right, they added an option (mostly) to make chapter 2 come below chapter 1. You will soon be able to use ebooks on the web using an open-source 9th century reading interface.

There was one prize awarded. The winning project, "See and Read" allowed two people with two screens to interact via an ebook. Very nicely executed and well-deserved, but if I told you that I had invented something that allows two people to interact with a book, you would say "oh?" because you don't really need 100 billion transistors and two glowing screens to interact with a book.

Another project "Breadcrumbs and Beanstalks", extended Harvard's StackLife interface to enable ebook browsing similar to that found in a physical collection of books, with 2 dimensional browsing (the 2 axes being publication date and the general-to-specific axis of subject headings. It looked a bit clunky, but thar's gold in there somewhere.

Hugh MacGuire, Max Fenton, Jean Kaplansky, and Fendi M were doing something brave with linking in  PressBooks, but having already done too much linking in my day, I decided not to understand it. A team from Sophia University in Japan was doing something clever with textbooks and Readium. There was work on converting PDF to EPUB and a project to use Phonegap to make apps from EPUBs.

The project I joined up with focused on making a book club application around a shared EPUB reader. (It works, too!) We based it on epub.js and Hypothes.is. I wasn't very useful to the effort, because the three Flatiron-trained Ruby-on-Rails developers in our group were too awesome for my plodding-but-powerful python to compete with. So I helped with exploration and documentation of the Hypothes.is API, and finding bugs and deployment gotchas in epub.js. I now know how to configure CORS for buckets in S3. Yeah that was my weekend. I also fixed a bug by staring, in an intimidating way, over someone's shoulder. Ah, good times.

What became apparent to me in working with these tools was that the freshly trained developers got everything to work by un-EPUB-ing everything. The web platform just works, with the one exception being that centering text blocks in CSS just doesn't, unless you look away from the screen. The EPUB platform always throws something in your way, for reasons that even StackOverflow doesn't explain. Ruby Zips won't unzip. Cross Sites won't request. Java Scripts won't bind.

EPUB's competition isn't Amazon and KF8 fixed layout, it's the web and HTML5 and its huge gravitational pull. For 90% of ebooks, the benefits of EPUB over HTML are scant (because EPUB is based on HTML!) and the development barriers are significant. It's been years, and still EPUB authoring tools aren't mature or mass market. Deployment tools are barebones.

Don't get me wrong. I'm still betting big on EPUB, but dammit, Publishing Industry, for an $80 billion pillar of modern society, you're investing a nanoscopic amount on your basic infrastructure (i.e. EPUB), despite the herculean efforts of the people I met last weekend.

Notes:
  1. I was really impressed with the Flatiron School students I worked with. If the rest of the students are anything like Edina, Tiff, and Dan (Ivan helped a bit, too), they are going to have a huge impact on the New York area economy. Maybe I should learn RoR.
  2. Bill McCoy has done an amazing job bringing people together under the IDPF and Readium umbrellas. Imagine what he could do with financial support commensurate to his task. Perhaps he should take up bootlegging.
  3. Jake wrote up his impressions, too.
  4. As have Virginie Clayssen and Camille Pène, in French.
  5. Would have posted sooner, but MILESTONE IN UNLUE.IT.

Enhanced by Zemanta

Thursday, September 19, 2013

Booksmash's Lust-O-Meter Shows How Innovation Happens

When HarperCollins decided to sponsor a hacking competition called BookSmash, they probably expected the participants to be a rag-tag collection of smart students, hungry young startups, and underemployed misfit coders. It's very unlikely that they expected Nobel Prize winners or seasoned tech entrepreneurs to show up. But, as I pointed out in June, they had made some interesting and fun resources available as part of the competition: 196 full-text books from some popular authors. I'll let you in on a secret: despite what you may hear elsewhere, it's fun, more than anything else, that drives innovation.

The results of the competition were unveiled yesterday. Some of the teams I was already familiar with: I met the BookCities, Coverlist and LibraryAtlas teams at Publishing Hackathon. ReadUp, from the the great folks at ReadSocial, is a neat idea definitely worth checking out. But Text Textures was the submission that popped out at me. The Text Textures team is Mira and Frank Wilczek, a father-daughter team. Frank is a Nobel Prize winning physicist, Mira is a ethical-coding serial tech entrepreneur. (Lyric Semiconductor and Red Panda Security. A new project is BookGobble.)

Text Textures starts out by imagining how fun it would be if you could just skip to the "juicy parts" of a book. It turns out that with access to the full text of a book, a pretty simple combination of weighted word counts supplemented with pacing heuristics allows a text analysis engine to measure things like lustiness (hence the "Lust-O-Meter"), affection, violence and occult themes. By graphing each of these attributes versus page number, it's easy to see where the "juicy bits" of a book are. But that's not where the fun ends. You can density-plot one attribute versus another. And so we find out that "the lustiest scenes in For A Few Demons More appear to have almost no affection". You can plot compare multiple books, and use the measures to decide what sort of book to read next.

I asked Mira about the genesis of Text Textures. She responded:
I've always been neural-net-curious. So when I found myself with a nice nest egg and some free time, I took the opportunity to round out my education. My dad (Frank) has conveniently also been curious about neural nets -- although he was more intrigued by the analogy to human cognition -- so we decided to work through Hinton's Machine Learning lectures on Coursera together. We've been doing fun technical projects together for as long as I can remember. When I was seven, we built a foot-stomping robot using Lego MindStorms. When I was sixteen, we used genetic algorithms to solve N queens.
As we went through the Hinton course, we started to think about real-world problems it might be interesting to tackle using some of those mathematical tools. Eventually we started playing with tracking characters through Sherlock Holmes .... then finding the action scenes where those characters appear ... then looking at other ways to classify scenes ... and thus the underlying idea of Text Textures was born.

The Lust-O-Meter in Text Textures is a fun toy. Which is to say that I would like to be able to play with it myself. I would build a snark-o-meter.  I'm not sure if a "Skip to Good Bits" button is something people want in the reader applications, and even if they wanted it they might not admit it. But eBooks don't have inherent page numbers, so new ways to navigate ebooks would be really useful. It's rather a shame that today's prevailing ebook environment of walled-garden DRM-encumbered marketplaces is hostile to innovations such as Text Textures. Even libraries are prohibited from doing textual analysis of most of the ebooks they buy. And because lustiness data, for example, is not protectable by copyright, rightsholders such as HarperCollins typically deploy restrictive terms of use on anyone they allow to access the full text of their works. It's not enough to open up just a crack for a hacking competition.

Everyone should be able to have fun with their books.

Note: you can vote for Text Textures or any of the other BookSmash submissions until September 27 at 5:00pm EDT by going here.
Update: @skyberrys notes that the Illuminate entry also has roots in #pubhack. I note that it's yet another contribution to the book world by a physicist!


Enhanced by Zemanta

Wednesday, June 19, 2013

Book Metadata Under a Bushel

Full story at the Verge
They don't allow witnesses, spectators or journalists to carry cell phones or kindles or iPads into the Federal Courthouse in New York. But books are OK. So every publishing executive at the iBookStore antitrust trial carries a book with them instead. For example, The Verge spotted Penguin's David Shanks sporting Robert B. Parker's Wonderland . The press takes a picture, and the next day the book, which just so happens to be an exciting new release, gets its cover onto the front page of the business section, not to mention Go To Hellman.

This opportunistic book publicity reminded me of the biblical parable:
No man, when he hath lighted a candle, putteth it in a secret place, neither under a bushel, but on a candlestick, that they which come in may see the light. Nor doth a scroll seller speak its name so no man canst hear. Nay, he shouteth from high mountain tops the holy numbers of the scroll.
- Luke 11:33 (more or less).
So you would think that book publishers would also be spreading metadata for their books far and wide, and would make it as easy as possible for developers to propagate the word. But the tyranny of "the way we've always done things" still holds sway in that world. And so, the HarperCollins OpenBook API and the BookSmash developer competition, which I ranted about in my last post, need to be understood as the positive steps they are. They are opportunities for publishers and developers to engage in ways that aren't chiseled in stone.

For my part, I've been engaging with some very helpful people at HarperCollins. Together, we found some documentation issues that had me unsure about the resources being offered to challenge participants.

First of all, the entire text of the 196 books listed in the resources spreadsheet are being made available. This is very cool. Also, 20% samples of all EPUB books in the HarperCollins catalog are available through the standard API.

Hints:
  • If you're participating in the challenge, you need to use a different endpoint than the one offered by the API demo tool to get un-truncated text. Yes, you copy the url it gives you (host name "diner") and replace the endpoint url with one reported in the text on the demo tool (host name "api").
  • If you want to use the catalog API to get ISBNs to use in the content API, note that only books/ISBNs with Sub_Format='EPUB' have preview content associated with them.
  • The API does request throttling in a funny way. If you make too many requests in a short period of time, the API tells you "Developer Inactive". That result seems to get stuck in a server-side cache.
  • The HC people seem eager to improve the API, so don't hesitate to report issues in their forums. If you've ever developed an API, you know that you have to whack at it a bit to get things right.
If you play with this API a bit, it'll be pretty obvious to you that "building an API" is not the way things have always been done in the book industry. Here's how things are done: Publishers cause ONIX XML files that describe their books to come into existence. These files are shipped to "trading partners". The reason, more or less, that the publishers do this is because way back when, Amazon forced them to do it that way instead of the horrible old ways they used to do things.

So the reason that the HarperCollins API, and others like it, are significant, is not because they'll be useful in their current form. It's because big publishers have realized that getting bossed around by Amazon might not be a smartest thing to do, and maybe having more direct relationships with developers would be a good idea.
Enhanced by Zemanta

Saturday, June 8, 2013

Publishing Hackathon a BookSmashing Success

Thursday, HarperCollins announced its BookSmash Programming Challenge. The book industry is nothing if not trend-driven, and after the success of the Publishing Hackathon, the BookSmash announcement qualifies "hacking" as a book industry trend.

The Hackathon turned out to be more significant than I expected. We should never underestimate the power of juxtaposing people with non-overlapping ignorance. I had the chance to talk to some of the other hacking teams last week, and they feel they learned a whole lot about the publishing industry. I also talked with Rick Joyce, one of the drivers of the event and Chief Marketing Officer at Perseus Books Group. He wrote me that his two revelations were "1) the importance of putting 'shareable data' (i.e. metadata) into a form developers want to work with (i.e. API's) vs. the feeds [publishers] traditionally supply to [their] trading partners. The world of developers are not going to incorporate you into their brilliant new lego creations if you don't give them lego-bricks to build with. And 2) this whole Open Innovation model is pretty mind expanding."

The organizers of the Publishing Hackathon got a lot of things right. The space was wonderful, the food was publisher-quality, and the publicity was excellent. (I admit that even the hype-laden website blurb that I criticized did its job well.) The variety of sponsors lent an open and collaborative atmosphere to the event. Even libraries were represented. It was a good decision to set a theme of "book discovery" for the event; this helped focus the participants and created a set of discussions that are likely to continue. Having the final presentations on the floor at BEA was brilliant. The party afterwards was fantastic.

The projects that were created at the hackathon won't solve the book discovery problem. The winning project, Evoke, won because it's both plausible and totally out of left field. But it's likely the knowledge gained by hackers and publishers during the process will advance the state of the art.

As with anything new, there are a number of things that could be improved on in future hackathons. Here's my list:
  1. Everyone is a VIP. During the presentations, three rows of chairs in the front were set aside for "VIPs". No one sat in them. Next time, make the hackers the VIPs.
  2. More prizes, more fun prizes. The gift economy of hacking and the cash economy of startups both need nurturing and cross-pollinating. Having one cash prize of $10,000 is less motivating than 5 $1000 prizes, and how do you split it if you have a big team? A prize consisting of dinner at a nice restaurant or some theater tickets might be a stronger motivation for participation.
  3. Hacker Judges. None of the 10 judges for the 2013 Publishing Hackathon actually do any hacking. Only 3 of the 10 qualify as technologists. None of them are designers. (As far as I know.) If you want to send a message that design, technology, and code are important to publishing, then build that dialogue into the judging process as well.
Now about BookSmash.

At first, I was seriously underwhelmed by the BookSmash challenge. It seemed to be a way for HarperCollins to prop up the sad, desolate ghost towns that are the OpenBook API and the OpenBook Content API. (The OpenBook API was launched in April of 2012 with the support of Mashery; the forums had attracted exactly one developer in the last year.)

But perhaps I judged prematurely. The competition website claims that a number of authors, including Peter Drucker, Eloisa James and C. S. Lewis, will be "making their full works available via the BookSmash Challenge version of the OpenBook API." This could be really exciting, but as far as I can tell, it seems to be a bit of an exaggeration. I checked James' Desperate Duchesses; the content API returns the first 20% of the work. I tried Prince Caspian and got this result:

epubFetch unable to display this book

Sorry, we have not loaded this book into the system as yet.
We are loading books on a regular schedule, so please check back.

 Still, I can imagine some interesting things that might be done with this data.

Update June 18: Have been working with the friendly people at HarperCollins to iron out documentation issues that had been blocking my access. I'll put some hints in a new post.

/START RANT/ In 2013, Metadata APIs like Harper's are NOT enough. The metadata is not very good, and there's not enough of it. Why would a sane developer go to HarperCollins for product metadata when she could go to Amazon or Google Books and not have it limited to HarperCollins products, not have it limited by HarperCollins Terms and Conditions (which forbid any commercial use), AND have the selling price included, too??? Also Prince Caspian Movie Tie-in Edition (digest) is NOT the title of a book! If you want interesting things to happen with your metadata, let developers download THE WHOLE DATASET! That's how you get the data to Amazon and BN, and that's how you should get it to developers! /END RANT/

It's like Rick Joyce said. If you want people to build cool things, you have to give them lots of cool bricks.
Enhanced by Zemanta

Sunday, May 19, 2013

Publishing Hackathon Pretty Much Ignores eBooks

The "First Annual" Publishing Hackathon was this weekend. As advertised, I participated and worked on an EPUB backmatter project. My awesome team consisted of me, Javascript/Ruby developer Max Jacobson (who's going to be even more highly sought-after when he finishes Rails school this summer), and TLC librarian Dianne Coan.

Here's our demo video:

 

Here's how we described the project:

Book Discovery INSIDE the eBook

When is a reader most receptive to reading suggestions? Right when they’ve finished a book of course! That’s why printed books have information about other books by the same author, the first chapter of the next book in the series and similar material at the end as part of the back matter.

Back matter has existed pretty much as long as books have. This includes the appendix, glossary, index, and bibliography. Back matter for digital books needs to be optimized to serve the needs of the digital reader. An informal survey by @suw indicates the most popular endmatter desires were other books by the same author and some information about the author.

Digital back matter for ebooks is not constrained by having to proceed the publication; unlike print, digital back matter can be kept up to date with the release of new content. For instance, if an author publishes a sequel, that title could be included in previously published ebooks.

It’s easy to insert a page listing an author’s other books at the end of an ebook, but how do you keep that list up-to-date? What if you’ve developed a great recommendation system to do “if you liked Pride and Prejudice, you’ll like X”? (or maybe “if you hated...”!)

The answer is to make use of the javascript capability of emerging ebook environments. Our project explores means of connecting to APIs from within an EPUB for the purpose of suggesting the user’s next read.

An existence proof is the “widget” capability of the iBooks iAuthor platform. It allows the insertion of html snippets into extended EPUB. Unfortunately, the javascript capability of ebook reading platforms, like the future, is unevenly distributed.

For this demo, we tested three reading EPUB environments, Readium, Readmill, and iBooks. We modified the Project Gutenberg EPUB version of Pride and Prejudice to include hooks and data to other books by Jane Austen.

Readium, which has been built as an EPUB3 reference environment, is the most capable for our purposes. It supports both javascript and connections to external web resources. In Readium, our EPUB displays the set of books by Jane Austen returned by the ReadMill API.

Apple iBooks has full javascript capability, but doesn’t allow connections to external resources (except perhaps via iBooks Author hooks- this deserves further investigation.) In iBooks, our EPUB displays a result page that we generated and embedded based on Jane Austen works published in 1813, when Pride and Prejudice released. We imagine that such embedded resources could be inserted at download time in a future production bookstore or library environment.

The Readmill environment does not support javascript at all at this time, so ironically, we’re not able to display the Readmill API results, or the iframe embedded resource.

Offline reading in Readium displays the resource embedded in the EPUB, similar to the iBooks version.
There were 30 projects in total presented at the end. Here's the list, along with my one sentence summary.
Banned Books in America
Website that maps book banning incidents and links them to Openlibrary
Book Discoverability: A Graphical Solution
Concept for browsing books as nodes on a graph.
Book Discovery INSIDE the eBook
This was us! Our demo crashed and burned. The popup screens from the wifi messed up the ebook reader display of embedded dynamic content.
BookCity Finalist!
Website that recommends books by connecting them to cities.
BookieGoer
Website that helps you lend the books you've borrowed from the library.
Booklvrs: Read. Discover. Meet.
App that advertises the ebook you're reading to the people around you.
bookmatchup
Website that multi-factor-matches you to books.
BookMob
Website that aggregates book recommendations from your twitter followers.
bookshelf.me
Website that displays books as if they were on a bookshelf. I'm pretty sure there was more to it.
Publy.io
Website that recommends books to users based on books they've liked.
Captiv Finalist!
App and Website that uses machine learning algorithms and your tweet about last night's party to combat the short attention span of Today's Readers. I may not have understood this one.
Coverlist Finalist!
Website that believes in judging books by their cover.
Evoke Finalist and clear judging favorite!
Pinteresty website that recommends books based on emotions categorization.
Happy Chapter
App that recommends books based on tags you click.
I read your Brain
Brain-sensing rabbit ears that wiggle depending on your response to a book from a website.
IGNITE
Website that lets users rate romance novels for steaminess.
KooBrowser Finalist!
Browser plugin that analyses what you read to better sell you books.
Library Atlas Finalist!
Mobile app that sends you geographically appropriate quotes depending on where you are. My favorite.
Literary Trinket with Book Wish
3D printed QR-ish code baubles. Cooler than it sounds.
Meadows
Website that turns reading into a game where you earn points.
Meme a book
Website that turns books into lolcats. (I may not have described this accurately.)
MovieReader
Website that recommends books connected to the movie you just saw.
NYPL Reinvent
Analysis of NYPL metadata advocating a divorce of the library from its classification system.
OkLetsRead!
Website offering crowd-funded serial fiction (ebooks).
Quiply
Website that recommends books based on a user's video viewing.
Reading Tollbooth: A Gateway to Book Discovery
Website to match kids to books.
Something2Read
Website that recommends books based on tags you click.
Valerie's Baby App
App that promotes literacy to a girl named Valerie by making sliding block puzzles and defining words at her.
Visibrary
Website that uses library data to make graphical book circles.
Vookstore
Website that turns ex-bookstore owners into book curation engines.
Interestingly, only 3 of the 30 projects addressed ebooks at all, which seems a bit odd to me, considering the industry's ongoing transition from print to digital. The emphasis on apps (7) and websites (21) is partly due to Hackathon's theme of book discovery, but it also says something about the tech industry. Apps and websites are what the NY tech industry is doing in 2013, not ebooks. Clearly, the publishing community developing ebooks and ebook standards needs to do more outreach to developers; the hackathon was a good first step.

It's also worth noting the growing importance of geo-tagging and other non-traditional metadata. In the new world of publishing discovery, readers want books that fit their mode right where they want to be. Neither MARC nor ONIX know enough to help.

My library friends should rest assured that the hackers did not at all ignore libraries. Although $1000 prize from NYPL was a factor, the ease of connecting to NYPL and OpenLibrary helped a lot. The RDA prize, it should be noted, went unclaimed.

Update: Sorry, Coverlist, I omitted your finalist status. Corrected!
Enhanced by Zemanta

Tuesday, May 14, 2013

Hack the Publishing Hackathon

Why a publishing hackathon?
Book discovery needs innovation. It’s never been easier to get a book into a reader’s hands—just one click. But, with over 10,000 books published each year on every topic imaginable, how do people find out about them? There are fewer bookstores to help readers discover exciting new authors and ideas. There’s currently no digital experience that replicates the serendipity of browsing bookshelves. Recommendation engines are fairly primitive – they know what you bought, but they don’t know why. It’s a disruptive opportunity that hasn’t been explored.
Seriously, the sponsors of this event don't think book discovery has been explored? I guess they were too busy suing Mr. Google to notice that Google Books is a pretty good discovery tool. I suppose they never thought to ask Mr. Wikipedia how many books are published every year.

All in all, I find the description of this hackathon INSULTING to just about every developer that's worked in the general vicinity of the book industry.

Umm. Mr. Steinberger. If you and Perseus really want to promote discovery innovation, then perhaps you have heard of Goodreads? They're driving some decent discovery of books. Maybe it doesn't count if Mr. Amazon is buying them. Perhaps you've heard of Amazon? They popularized the "If you liked this, maybe you'll like..." feature that everyone in the publishing industry tries to copy. If you don't like Goodreads, maybe I can introduce you to LibraryThing, which has been driving valuable book discovery in more ways than I can list here. I know that "library" in their name is a big turnoff for your big 6 colleagues, but libraries are huge book discovery machines. I don't suppose you want them to disrupt anything. And umm DP.LA????

People mostly discover books by word of mouth. Some  innovators promoting social reading include Readmill (who had their own publishing hackathon) and (giving props to the NYC home team) ReadSocial and the stuff Bob Stein has been exploring. And Kobo, Copia and Zola are doing some amazing things to integrate book discovery with ebook selling and reading environments. I've written previously about Jellybooks' fresh approach to discovery.

And some more on libraries. When I was at OCLC, we worked on real simple problems like "how do you discover the other editions of the same book?" and we found that publishers had NO CLUE what they'd published 5 years previous. So yeah, we did our bit.

But I'm coming to the hackathon anyway. because despite the ridiculous framing, this event has some clueful backers. NYPL for one. Small Demons for two. And they're even wasting prize money on a new age library metadata thingy. (I might be wrong about the wasting part.)

I'm hoping that some people will be interested in rethinking ebook front matter. Unglue.it needs books to work better all by themselves. The best discovery instrument for a book is the GDMF book, to my mind. So let the book do some work. With a little javascript. And no more DRM, thank you very much!
Enhanced by Zemanta

Sunday, August 26, 2012

The "I Used This" Button


I frequent-flyered off to San Francisco this weekend to surprise my Ph. D. Advisor, Jim Harris, for his 70th Birthday. I was the first of his students to graduate, and he's up to 105. On thing I learned helping to start his group was the immense value of being thrown together with a group of smart people with a variety of experience. I met members of Jim's current group, which includes a student from Gunn High School, visiting scholars from around the world, and Ph. D. students bursting with ideas.

I manufactured some business-related meetings for the trip, some of which I'll relate in a to-be-written post, but I also lucked into a hackathon for Open-Access hosted by PLoS. I spent the day with a group of smart people with a wide range of experience, including software developers, product managers, film-makers and a librarian or three.

The group I ended up working with included Greg Grossmeier from Creative Commons, Cameron Neylon from PLoS, and Ana Nelson, the developer-entrepreneur behind dexy. We were interested in counting open-access things. Counting things can be harder than you think, because you have to define the things and identify them; you need to be able to tell whether a thing is the same thing as another thing, or perhaps it's three things. Counting bananas is one thing, but have you ever tried counting ideas?

Creative Commons (CC) is interested in knowing how much its licenses are used. When an Unglue.it ebook edition is released (of course under Creative Commons!), how often is it used? Does a single license apply to the entire book, or can we apply different licenses to the different resources inside the book? For example, an author may want to use a CC-BY license for the text of a book, which might contain figures that are used under CC BY-ND. And the metadata should be CC0. How should these licenses be expressed?

After some discussion, we settled down to work on some specific projects. My project turned out not to be code at all, but rather a description of a scheme for measuring Creative Commons usage, i.e. the rest of this blog post.

Creative Commons has thought about ways to measure the usage of its licenses. For example, it can track the display of its license "badges", such as the one right here. Web browsers will send a referrer header that tells the image server the web page and user IP address. But there are problems. Many web sites use their own copy of the badge. In an ebook, the badge would be embedded in the ebook file. If the page is served over a secure socket, the referrer won't be set. And do you really want to tell Creative Commons about everything you're reading?

Speaking of which, have you clicked on a Facebook "Like" button this week? Was it good for you too?

Suppose there was a button on Creative Commons licensed documents that allowed the user to express their delight at the creator's enlightened choice of license. Would you click it? I call it the "I Used This" (IUT) button, but maybe you can think of a better name.

  • The IUT button would send a signal to a Creative Commons server about usage of the resource. These signals would be compiled and reported.
  • IUT button would also send attribution url.
  • Pressing the button would display an amusing animation to the user. Perhaps every button would have a different animation to avoid button fatigue.
  • The button would be at the center of an advocacy campaign for open licenses.
  • Unlike the Facebook Like button, the IUT button would respect a user's privacy. A signal would be sent only when initiated by the user, and would be optional.
  • An IUT button packaged as a javascript would work in epub, html, etc.
  • IUT signals would be evidence of the resource's status as a CC licensed work. A licensor attempting to revoke a CC license (you can't do that!) would have to overcome a verifiable usage trail.
  • Users could create accounts at CC to provide a retrospective record of the user's Use.
  • Clicking the IUT Button would put the attribution url on clipboard to ease correct citations.
  • Usage information for each resource would be public- creators could easily track the usage signals for their works.
  • We might need anti-ballot-stuffing measures if CC usage rankings become commercially important.

If efforts like Unglue.it are to succeed, people who appreciate the benefits of Creative Commons licensing need to stand up and be counted. We need to make it a mass movement in the minds of every lover of books, everywhere.

Sometimes you need to do more than just consume. Sometimes you need to do some SHOUTING.

Enhanced by Zemanta