Friday, December 31, 2010

2010 Summary: Libraries are Still Screwed

In mathematics, catastrophe theory is the study of nonlinear dynamical systems which exhibit points or curves of singularity. The behavior of systems near such points is characterized by sudden and dramatic changes resulting from even very small perturbations. The simplest sort of catastrophe is the fold catastrophe.

When a fold catastrophe occurs, a system that was formerly characterized by a single stable point evolves to a system with no stability. The point where stability disappears is known as the tipping point.

One of my goals for this past year was to raise awareness of the tipping point for libraries that will accompany the obsolescence of the print book. In January, I noted that Hal Varian's equation describing the economic value of libraries also predicts that libraries of the current sort won't exist for ebooks.

In March, I put the question directly to John Sargent, Macmillan's CEO. His response, that ebooks in libraries were a "thorny problem" got quite a bit of notice. Unfortunately, the big trade publishers have yet to actually do much to address the thorns.

In May, I was pleased that the editors of Library Journal were putting together an "eBook Summit" virtual meeting to address some of these issues, and even more pleased to be invited to write a series of articles to help frame issues for the Summit. The event ended up being titled ebooks: Libraries at the Tipping Point. For me, the highlight of the summit was Eli Neiburger's talk on How eBooks Impact Libraries. This talk is destined to be known forever as the "Libraries are Screwed" talk, and if you've not viewed it I urge you to do so forthwith.

Several other contributions raising awareness of the library-ebook catastrophe are worth noting. Emily Williams' commentary on Eli's talk is worth reading and her attention to the issue has been consistent. Library Journal's Heather McCormack is another persistent voice- I particularly loved the story she told in a column written for O'Reilly Radar. Tim Spalding's post on "Why are you for killing libraries is another favorite.

So at the end of the year, what have we accomplished? One disappointment for me was that although Library Journal's eBook Summit was quite popular with librarians, it appears that very few publishers took notice. On the rare occasions when publishers took notice of the role libraries could play in the ebook future, they tended to be depressingly reactionary, such as when the UK's Publishers Association set out their plan to marginalize libraries while apparently thinking they were boosting them.

Similarly, Amazon announced yesterday the addition of a lending feature for the Kindle. This feature seems designed to compete with a similar feature in the Nook, but nowhere in the announcement is there any mention of libraries as being anything other than the books on a user's Kindle.

Meanwhile, adoption of ebooks and ebook readers has accelerated. Amazon announced that the third-generation Kindle is the bestselling product in Amazon's history. Barnes&Noble fired back, reporting that the NOOKcolor is the best selling product in its history. In comparison, this month's announcement by Overdrive that it (finally!) has released apps for reading its library ebooks on Android devices and iPhone seems a bit too-little-too-late. (sorry, no iPad version!).

Perhaps the time is over for raising awareness about the catastrophic future of libraries. In 2011, let's build things that change the system dynamics.

Tuesday, December 28, 2010

Why does Sarah Palin Charge for eBooks?

A neighbor of mine, a professor at Rutgers, has several published books to his name. A few years ago, the publication costs for one of his books were partially underwritten by an environmental organization, because they thought it was important work which ought to see the light of day, or at least the light of a few libraries.

Digital publishing makes possible new ways for works like my neighbor's to be seen by as many readers as want to read them. Since the cost of making digital copies is almost zero, it makes sense for them to be given out for free once the first copy costs have been paid for, especially if that cost is being borne by an advocacy group.

Which brings me to Sarah Palin, and her big-time new book, America by Heart : Reflections on Family, Faith, and Flag, published by HarperCollins (a division of Rupert Murdoch's NewsCorp). It costs $12.99 for the Kindle version and the same on iBooks, Nook, and Kobo, thanks to agency pricing. If Palin is running for president, as is widely assumed, you'd think that she'd want as many people as possible read the book. So far, sales have been disappointing, but for argument's sake, let's assume that it's a brilliantly persuasive thrill of a read. Why bother charging for the ebook, since it costs almost nothing to make copies?

Well, there's the small matter of the big advance (she got $1.25 million for Going Rogue). It all boils down to money. Most of the people buying the book are probably already political supporters, so writing (or ghost-writing, as the case may be) a book like America at Heart is as much a mechanism to monetize political support as a genuine contribution to our nation's political discourse. For the publisher, a Sarah Palin book is a win-win-win. Not only can the publisher make money if the book turns out to be a blockbuster like Going Rogue, which sold over 2 million copies, but the prestige of a popular figure rubs off on the publisher's brand. And when access to the corridors of power comes along with the advance, how can the publisher lose?

Bill Clinton, speaking to the AAP in March 2009
I don't mean to be partisan; the argument applies equally to Dreams of My Father and The Audacity of Hope; the Obama administration has so far been very friendly to the book publishing industry on its "special issues". And don't think that Bill Clinton would have been speaking at the Association of American Publishers Annual Meeting, as he did last year, if he wasn't also an author with a big deal.

It's surprising that these arrangements don't get more scrutiny. It would be illegal for a politician to accept money into a personal bank account from a big company with major political issues before him, but it's OK for a publishing company to write 7-figure checks as advances to those same politicians if there's a book being written to serve as a fig leaf.

Publishing books has always been a double-edged sword for politicians and advocacy groups; a good book bolsters a cause both financially and intellectually. The publicity tour surrounding a book is as much about publicizing the cause as it is about selling the book. But looking to the future, what sort of digital business models will be most effective for authors whose primary goal is to advance a cause?

While the current business model offers many advantages such as access to a publisher's marketing and distribution channels, the author receives a rather small percentage of the money paid by book purchasers. In 2008, the year he was elected, Barack Obama's books sold about a million copies; he reported income from these of $2.5 million. Most authors get a significantly smaller percentage of sales.

If you've been reading this blog, you probably won't be surprised at the alternative I suggest:  bounty markets for open-access ebooks will be the ideal way to accomplish the twin aims of advocacy and fundraising. By first raising money from core supporters, and then releasing an ebook for free, a cause-oriented author can use the open access ebook to gain new converts.

Perhaps ebook bounty markets will even need to put in safeguards to avoid being misused as a naked money conduit to launder campaign contributions.

A guy can dream, can't he?

Friday, December 24, 2010

Christmas and Cooking

In Lalbagh Garden, in Bangalore, I found this plaque establishing the link between Christmas (tree) and the Cook (pine):
"Christian community treat this tree as sacred one & worship during Christmas."
Note the Muslim women having a photo snapped, with "Christmas Trees" in the background.
In my house, many holiday rituals surround cooking. One of my most precious possessions is this copy of Stora Kokboken, which I haul out every December to help with my liver pâté, my award-winning Swedish coffeebread and whatever else sparks happy memories. I say hello to my mom's notations and feel her presence.
(Stora Kokboken, Edith Ekegårdh and Britta Hallman-Haggren, Wezäta Förlag, Göteborg, 1957)
Opposite the title page is this:
My parents were married in December of 1957; the cookbook was a Christmas present from my mom's father, stepmother and stepsister. The note says "En god jul önskar vi er!" which translates as "We wish you a Merry Christmas".

What they said.

Tuesday, December 21, 2010

Crossing the Street in India

It sounds funny to say, but the thing I'll remember longest about my two weeks in India is learning to cross the street. When I first arrived, I didn't dare. Not only do they drive on the wrong side of the street, but they also drive on the right side of the street, the middle of the street, and on various surfaces that would not be considered streets here in New Jersey.

The protocol for pedestrians and motorists to coexist was not apparent to me. Pedestrians seemed to cross the street with minimal regard for traffic; the cars unaccountably seemed to miss them at high speed. After a few days of watching this dance, I screwed up my courage and crossed in the wake of some elderly women in saris. By the weekend, I was crossing on my own; the key seemed to be steadiness- a sudden move could fool a three wheeled "auto" or motorcycle carrying a family of six into your path. Motorized vehicles in India always have to be on the lookout for the vegetable cart, cow, goat, dog or camel that might need to share the roadway.

I learned a lot about other things, too. Last Wednesday, I gave a lecture for the Bangalore chapter of the Society for Information Science. My talk was titled "Why Libraries Exist: Transitioning from Print to eBooks." I've been working on this talk for a while (based partly on this post); but this was the first time I've given it publicly; I'll be giving versions of the talk twice in February.

There were lots of questions and much discussion. There are so many differences between conditions in the US and in India regarding ebooks. Not only are adoption rates very different, but there are potential ebook applications in India that had never occurred to me.

For example, while e-reader adoption is negligible in India, it may well be that textbook distribution via e-readers may happen sooner in India than in the west. In India, textbooks are a government-run enterprise. Government agencies produce, print and distribute textbooks throughout the country; owing to the country's huge population, the number of textbooks printed is quite large. Getting the right textbooks to remote locations can be a real logistics challenge. If e-readers could be made cheaply enough, there would be many advantages and potential cost-savings in their use for textbook distribution. Compared to laptops and desktop computers, e-readers pose smaller demands on electrical infrastructure; the demand for mobile phones even in rural areas has already spurred the introduction of portable chargers (hand-powered and solar-powered).

Public libraries in India are under-utilized compared to their counterparts in the west. One of the people attending my talk was Dr. M. S. Sridhar; he asked some very penetrating questions about the effects of ebooks on readership. He asked the audience to raise their hands if they were a member of a public library. Only a few hands went up. He later sent me a copy of a column he had written for the Deccan Herald (61 (24) 24 January 2008, DH Education, p II.)
During last four decades, the percentage of American population having public library cards has increased almost three fold from about 25 to 72. On the contrary, in a progressive state like Karnataka, after 40 years of enacting a comprehensive Public Libraries Act and establishing 5000 libraries, we have not been able to reach more than 2% of the population. A poor market penetration of public libraries over half a century of National Library Movement!
If my thesis about the economic forces behind "why libraries exist" is really true, then book sharing mechanisms ought to arise in any free-market society. If so, then why is there so little utilization of public libraries in India? Is it the lack of a "reading habit"? Given  the stories I heard about even poor domestic workers making great sacrifices to send their children to expensive schools that teach English; I have my doubts about this. Perhaps it's due to a widespread availability of inexpensive books; I didn't see that in the bookstores I visited. My guess is that informal book sharing  mediated by family, clan, school and work relationships are filling the economic niche left by public libraries that fail to connect with their communities.

Having survived numerous street crossings and a 26-hour ocean and continent-crossing journey to get back home, I have a better appreciation of the many ways that societies will negotiate the print-to-digital book chasm.

Saturday, December 11, 2010

The Most Important e-Reader Company You've Never Heard Of

Imagine you're the head of a big print media company. Sales of your print products have been eroding steadily and you find yourself competing with internet businesses that have very different business models and cost structures. Your own website revenues have been growing steadily, but it will be many years before they match your print subscription revenue. You know in your heart that digital subscriptions will somehow be the answer, but your otherwise loyal reader base is resisting web subscription rates that would allow you to sustain your business.

What do you do?

One answer is something I've speculated about, based on cost reduction trends for tablets and ebook readers. Bundle a reader device into your subscriptions! The consumer gets a device for "free", and you've retained a premium subscriber. You could even bundle a shopping application and collect a commission on purchases of content made through your estore.

At first glance, the obstacles to a successful implementation of the bundled e-reader strategy might seem profound. Here's an incomplete list of what you'd have to do:
  1. You'd need an inexpensive device, of course. But it can't be something cheesy, because it will be carrying your brand.
  2. You'd need an operating system for your device. Lucky for you, Google seems to have a good solution with Android. But you still need people who can customize and adapt Android to make it shine.
  3. You'd need to figure out the logistics of getting your content onto the device. Maybe you've already started this process with Apps for iPad or Android.
  4. If you're serious about an estore, someone has to build that, too. It's not only an engineering project it's also a big business development task to gather businesses willing to sell their content and goods through your estore.
  5. You'll need to build a customer support operation.
  6. You'll need some big marketing power and a sales channel.
Most big media companies have only the last item in their portfolio of competencies, so you might expect this isn't going to happen anytime soon.

But you'd be wrong.

On Thursday, my rounds in Bangalore took me to Ninestars Information Technologies Ltd. Ninestars specializes in back-end infrastructure for publishing companies. They started out doing newspaper backfile digitization, and today they work with a Who's Who of the newspaper and publishing world. It's Ninestars, an Indian company,  that's actually doing much of the heavy lifting in the preservation of the entire world's history. Early on, Gopal Krishnan, the company's Chairman, made a crucial decision to steer away from the brute-force, labor intensive approach to digitization in favor of developing technology and automation so that jobs could be done faster and with fewer people. That decision has paid off as publishing has become more and more dependent on advanced technology, whether it's website and e-commerce development, content delivery, or mobile and tablet apps.

Ninestars operates under a "white-label" business model. If you go to their website, you'll be surprised at its small size and inertness. But don't judge a book by its cover- Ninestars and Gopal are major players. Playing the white-label role to the hilt, Ninestars wants the spotlight to shine on their customers, as if Ninestars didn't even exist.

Ninestars' e-reader strategy is similarly white-label. Tablets and e-readers are being developed by Ninestars' associated company DisplayTronics Reader Devices Ltd.. DisplayTronics CEO Dr. Somanath V S gave me a peek inside the DisplayTronics development laboratory. I saw a range of e-reader prototypes- both EPD and color LCD, all with touchscreens. These will be sold under brands that are already familiar to consumers.

DisplayTronics is also building a white-label "estore" that its customers will use to sell their content, whether on their branded e-readers, or through apps on other device platforms.

While I've written a fair amount here about business models for ebooks, I haven't really covered business models for e-readers. Everyone knows about Apple's hardware-oriented business model and Amazon's commerce oriented business model, but there's not been much in the way of business models that put content at the center -yet. With e-reader pricing dropping at a relentless pace, big media companies are potentially in an excellent position to remake the e-reader market in a ways that sustain their businesses. Companies that try to make money by selling devices might find it difficult to compete against free or subsidized devices bundled into media subscriptions.

We'll find out how well the bundled-e-reader business models work when they launch - in 2011.
Enhanced by Zemanta

Wednesday, December 8, 2010

Inside an Indian Bookstore

I'm in Bangalore, the tech capital of India, where even the Cook pines look like cell phone towers.

My favorite "tourist" activity in a country new to me is to go inside ordinary stores. A walk through grocery store will give you a much better understanding of a nation's cuisine than their best restaurants. You'll see strange foods and smell strange smells, and you'll see familiar foods presented in unexpected ways.

So I just had to visit a bookstore here. I chose a medium sized one, situated on the second and third floors of a building in a busy shopping district. Like bookstores anywhere else, Indian bookstores find it profitable to stack lots of non-book items- in this store, the entire second floor is taken up with DVDs, toys, games, teddy bears and Ganesha figures.

In this bookstore, six of seven aisles are devoted to English-language books. The seventh aisle is devoted to books in the local language, Kannada. Kannada is spoken by a total of 60 million people in southern India, which is a lot more than than speak many European languages, but in India it's a minority.

As you might expect from Bangalore's status as a tech capital, an entire aisle of the bookstore is given over to computer-related books. The prices of these books were higher than I thought they would be. For example, the latest edition of Programming Perl was 750 Rupees, or about 17 US dollars. In the US, it's 33 dollars at Amazon. (Wow, that's pretty expensive!) Books aimed at the Indian market are priced lower- 150-300 Rupees.

You might wonder how programming books can command such prices given that many good ones are available in free html versions in a region with a per capita income only 1/40th of the US; even a software developer's salary is a fifth that of his US counterpart. Part of the reason may be that it's still unusual for people to have access to the internet at home. Print works better here.

By all reports, the ebook reader market in India is be poised for an explosion. If you think about it, a cheap ebook reader would be a really good way for Indians to take that free-on-the-web ebook home from the office. The bookstore I visited had a single ebook reader for sale- it's displayed in the book section on top of other luxury items such as the plastic-sealed copy of Dan Brown's Lost Symbol.

Infibeam's "Pi" ebook reader sells for 10,000 Rupees, or about $225. It's made in Taiwan and looks like a stripped-down Kindle, with e-ink display, but no keypad and no wireless connectivity. It supports all sorts of ebook formats, and most of the 22 official languages in India. The marketing slogan for the Pi is "Read eBooks Anywhere!" and the availability of free ebooks, including the "top 100 from Project Gutenberg", is prominently noted. The Infibeam website (which exhibits sincere flattery of Amazon) offers a wide selection of eBooks. Stieg Larsson has a good share of the Pi at 433 Rupees (close to the $9.99 Kindle pricing); he was nowhere to be found in the Bangalore bookstore.

While the Pi is first ebook reader targeted at the Indian market (Kindle is also available) there are a number of competitors waiting in the wings- I'll cover them in another article.

Tuesday, December 7, 2010

Lots of Markets, Lots of Business Models

from Wikipedia
The book industry is a lot like the Soviet Union. The Soviet Union consisted of fifteen ethnically divergent states (soviets) stitched together by a highly centralized government model. When that government model weakened, it turned out that there was little holding the soviets together. The Soviet Union no longer exists.

Discussions of the ongoing transition from print to digital books frequently make reference to corresponding transitions that have occurred in the music industry and in the film industry. As I've learned more and more about how the book industry has worked in the age of print, it's been made clear to me that unlike music and film, the book industry has consisted of a large number of disparate industries thinly stitched together by a common delivery mechanism and shared supply chain.

Predicting what will happen to the book industry in its shift to digital by looking at music is like making predictions about the Soviet Union by looking at Germany.

Consider the different ways that books can be of value. The utility of a cookbook has very little in common with the utility of a romance novel. A dictionary and a travel guide are very different, even though you might take both on a trip. A manual on object oriented programming and a superman comic book might both be useful for squashing bugs, but the former should be open and the latter should be rolled up when doing so. I won't even mention coffee table books.

Compare the music industry. No matter the genre, most everyone uses music in the same way. Whether rap or raga, Beethoven or Roll Over Beethoven, the variations in consumption patterns are relatively small.

When utility variations of goods are small, the business models behind their distribution tend to converge. The film and television industries had very different pre-digital distribution models, but their business models are converging today because they are valuable in much the same way. Scholarly journals are another good example. 10 years ago, when journals began the transition from print to digital, publishers started out with a variety of business models for different fields. Today, however, the vast majority of journal publishers use roughly the same business model.

That's why I think the book industry of the past is fragmenting into many smaller industries based on business models that deliver the most value. We've already seing this begin to happen. The old business model of the encyclopedia industry has already died;  Wikipedia has almost wiped out the incumbent encyclopedia publishers with a public charity business model.

Whenever O'Reilly Media and their willingness to do free-on-the-web versions of their books come up in a conversation among publishers, one of them will say something like "well that's a very different market segment from ours". And it's true. The utility of a book such as Programming in Perl is of a sort that free-on-the-web + ebook + print works well.

While I was in London I had a chance to chat with Frances Pinter, who I've mentioned on the blog. Her imprint, Bloomsbury Academic, is experimenting with another business model, one that resembles the "freemium" models common on the web. Basic ebooks will be available under a creative commons license, but print and enhanced digital versions will be available for purchase. Other publishers, such as Springer, are going with subscription based models for institutions. The subscription model, without DRM, works just fine for ejournals, and Springer is confident that it will work just fine for eBooks in libraries. I've previously written about other business models for ebooks, such as PDA (Patron Driven Acquisition) and of course, "collective acquisition" and "bounty markets". Other models that are likely to work in at least some of the book industry fragments include advertising and what I will henceforth call the PIP (Pretend It's Print) business model.
London Online Information

As I learned at October's ICv2 Digital Comics Conference, digital publishers of graphic novels are finding that a serial mini-book business model works well for their industry. By pricing smallish chunks of content at under a dollar, a series can build readership momentum and do well financially without DRM.

Geography and language are also likely to define book industry fragments. At the London Online Information show, I met a group of entrepreneurs from Estonia. Their company, LibroCS OÜ , has developed an ebook delivery platform that includes a storefront and nifty color-video capable ebook reader devices manufactured in Shenzhen. They hope to provide infrastructure for retailers and publishers, especially those in smaller countries like Estonia.

I was interested to hear how the market dynamics in Estonia might be very different from those in a larger country. There are only about 1.1 million speakers of Estonian in the world, but it's a very well connected group. (The one Estonian I went to school with is brother to the President, so it must be true!) As a result, any Estonian is only one Facebook friend away from any other Estonian, or so it's claimed. If you ask an Estonian publisher to allow lending of an ebook, they immediately think they'll only sell 2 copies, 3 if the author's mother is living. To address these concerns, LibroCS emphasizes their use of Adobe Content Server DRM to enable PIP business models. Just because the print model is old doesn't mean it's dead.

Physics tells us that interacting systems eventually reach their lowest energy state, although there may be some heat required for them to reach it. The corresponding rule in economics is that a superior business model will sooner or later drive out its inferiors. In the context of the Former Book Industry, "superior" means that it offers the best value relative to its cost. Business model experimentation is what provides the heat. An English chemist friend of mine pointed out to me this week that when the experiment releases energy, you can get an explosion.

For the record, many Estonians refuse to acknowledge that Estonia was part of the Soviet Union. I don't know what that means for book publishing.

Friday, December 3, 2010

Biblio-Social Objects: Copia, Mendeley, LibraryThing and Mongoliad

The long awaited Copia e-reading platform finally launched, and the big surprise is that the previously announced devices have disappeared. Copia's innovation is that it smushes together a bookstore, reading environments that live across devices, and a social network. As such, it's interesting to look at.

Another recent arrival is the App for Mongoliad, the ambitious collaboration between Neal Stephenson, Greg Bear "and friends". It's a serial work built on a custom platform (including both website and apps) that supports multimedia and user-generated content, and it aspires to be a community inside a fictional world containing multiple narratives rather than a novel.

It's undeniable that books belong in our social networks, but it's far from obvious how social aspects should fit into the reading experience. Copia is designed with the point of view that the integration should be tight- it allows the sharing of annotations right within the reader application. Mongoliad does that, and more, it invites and rewards reader contributions, even to the point of letting the community influence the narrative. In thinking about how books should fit in to a social network, it's useful to look at two thriving social networks built around bibliographic objects, LibraryThing and Mendeley.

LibraryThing describes itself as a "social cataloging website". It helps you catalog your book collection, but it derives its vitality from the way it lets members use your personal catalog to connect with other LibraryThing members. A typical "Thingish" activity going on now is SantaThing, a sort of Secret Santa for Book Lovers. (Sorry it's too late to join for this year!)

Mendeley is surprisingly similar in function to LibraryThing, but it concentrates on a different set of bibliographic objects- journal articles. Like Copia, it includes a stand-along application, but its core utility is managing references for scientists and scholars. If that's all it did, it wouldn't create much excitement- services such as RefWorks, EndNote, Zotero and many others do that job. It's the emerging community that makes Mendeley special, which has a look and feel reminiscent of Facebook. Mendeley has recently added a public groups feature that helps researchers coalesce around topics defined by articles of interest. Very soon, they'll be rolling out a feature that connects users with other users based on their reference libraries.

It's a bit early to judge either Copia or Mongoliad- both shows lots of rough edges and awkwardness, but you expect that in things so new and ambitious. (Random examples- At first, Copia didn't remember where I was in a book; now it does, and I'm not sure what changed. You have to log in twice because of DRM- once to Copia, and a second time to Adobe. This is soon to be fixed, according to Copia representatives. Mongoliad's web version has an annoying border trim that made the edges of the page rather hard to read, and the navigation is at times mystifying.) Instead, I'd like to point out two architectural issues that these products raise, and which aren't likely to change easily. The answers to these questions are likely to determine their ultimate success or failure.

1. Should reading environments and social activity be tightly coupled or loosely coupled?

Copia is betting that a better user experience will result from the tight-coupling philosophy. Comments and annotations live in the social graph of Copia members and are fed right into the Copia reading applications. It's the same philosophy that Apple has used for its Ping network with no great success - yet. YouTube's social networking features could be characterized as being tightly coupled to their video content, and if Copia achieved a fraction of YouTube's success I'm sure they'll be quite happy.

The alternative to tight coupling would be loose coupling. There's no technical reason that users of Kindle, iBooks, Nook, Sony and Kobo reading environments couldn't all share comments and annotations via Twitter, Facebook and Buzz messaging backbones. Technical frameworks for open annotation are beginning to emerge.

Loose coupling is the way MLB has added social activity to their baseball Apps, and it works well there. Watching baseball is very much a social activity, but that doesn't mean that people want to build their social network around baseball games. My guess is that reading a book is more like watching a game in terms of the social interactions that work well around it.

Loose coupling to social features would allow users to combine their favorite reading device or application for example, Stanza or Ibis on iPad, their favorite social network, say LibraryThing or GetGlue, and their favorite shopping environment, which might be Amazon or Kobo. Loose coupling makes for more competition, resulting in a more challenging business environment for the provider of the social network. The types of web services present in both LibraryThing and Mendeley (and Facebook and Twitter, for that matter) allow coupling to other services, greatly increasing the footprint of the social network.

Mongoliad, by contrast, is content with extreme coupling to its social network, to the extent that you worry about scaling. While the Mongoliad platform supports only one narrative work (not sure if I should call it a book!) the company behind it, Subutai, intend it to be a platform for many different works. How will the community formed by one work interact with other communities on the same platform? will there be narrative leakage?

2. Which comes first,  objects connecting you to friends, or friends connecting you to objects?

This is a chicken and egg question, of course, but the interactions enabled by a social network are of one type or another. In a network like Facebook, the friends come first, and the stream of social interactions can bring along connections to many types of entities, books included. In networks like LibraryThing and Mendeley, it's the other way around- the books and articles create connections between you and other members of the network.

The reason this question is architecturally important for book- and article- -oriented networks is that it determines whether the objects in your network are works or whether they are products. Products can live in a friend-first network, but they stick out like a sore thumb in book-first social networks, where they really need to be "works".

To understand what I mean, consider a book I mentioned in a previous post, Charlie Chan: The Untold Story of the Honorable Detective and his Rendezvous with American History. Copia has a product catalog, not a work catalog, and so there are two separate entries for this work, one an ebook, the other a hardcover. When a paperback comes out, there will be a third entry. It makes very little sense for my social interactions surrounding the hardcover to be separated from similar interactions surrounding the ebook. Other works are much worse. Try loading "Moby Dick" into a Copia library. Although it's a public domain work, free from Project Gutenberg, you have to pay for the versions available for download in the Copia store. It's not as bad as Kindle but it's not as good as Apple's iBooks; I WAS able to import my iBooks file of Moby Dick into the Copia Reader.

The "thing" that really separates LibraryThing from other book oriented social networks is the emphasis it has put on the grouping of different book editions. (Full disclosure- one of my achievements at OCLC was managing the productization of OCLC's book-grouping web service, xISBN, which competes with a similar service from LibraryThing). Similarly, Mendeley expends a huge computational effort determining which article instantiations are the same as other article instantiations in its network.

Retailers naturally work at the product level, and the current version of Copia exposes the weaknesses of using product data as the basis of a social network. They'll eventually clean this up (as has Amazon) but it will be a slow and difficult process for them to re-engineer their backbone to address the complexities that libraries have long needed to deal with.

Mongoliad, though I'm sure it will cause the ISBN agency all sorts of headaches, is crystal clear about its status as a single, sprawling work. It will be interesting to watch its development as users begin to interact around the objects inside the work- characters, maps, places, etc, each of which is associated with a "'pedia" page.

Now what?

Once you've collected a network of people around a book, what happens next? Social networks are not unlike coffee shops or bars. If the business model for the network owner is to sell books (beer, coffee), the point is to get the network to buy their books (beer, coffee) through the network. Thus Copia's network will inevitably be slanted towards discovery of new things to read. If the business model for a social network is to collect some sort of membership fee, the point is to make the members so cozy they recruit more members. Hence the vibrant communities at LibraryThing and Mendeley, both of which use "freemium" business models. If the business model is to sell a subscription, the point is to get the reader hooked on characters and continuing narrative narrative. Hence Mongoliad is likely to include a lot of cliffhangers.

There's one more thing a book-mob will be able do, and that's evangelize the book. Large, evangelical groups of readers are exactly what Gluejar will need to gather the financial muscle to "unglue" books. Cooperation with all sorts of social networks will be a key to the success of this venture.

Even though its very much a self contained system, I'm really starting to get into Mongoliad, however. Thinking back on other Stephenson works, I'm realizing how ill-fitting they are in book form. The Subutai platform has unglued the narrative from the pages in a rather unexpected way.

I'd like to acknowledge invaluable conversations related to this article with Copia's Sol Rosenberg, LibraryThing's Tim Spalding, Ian Mulvany and Jan Reichelt of Mendeley (and all of Neal Stephenson's novels!)

Enhanced by Zemanta

Wednesday, December 1, 2010

Where English is Spoken

According to Wikipedia, the countries with the most English speakers are
  1. United States, 251 million English speakers
  2. India, 232 million English speakers
  3. Nigeria, 79 million English speakers
  4. United Kingdom, 60 million English speakers
  5. Philippines, 50 million English speakers
The US and UK, of course, are wealthy countries with strong publishing industries, but it would be a big mistake for English language publishers to ignore the possibilities for growth in the other three. India in particular already supports a billion dollar publishing industry.

The coming transition from print to digital books will cause significant upheaval for distribution of books from the US and UK in countries like India. This is because India and other developing countries are "price-sensitive". The idea of selling a popular western novel for $30 or so is seen as outrageous price-gouging from the Indian point of view. The book industry has historically worked around this issue by carving up rights to books into regional pieces. An Indian publisher can thus bring out an inexpensively manufactured edition of the book and sell it at a 20% of the price it fetches in its home market. The regionality of the selling rights protects products being sold in the developed market from competition with products priced for sale in developing countries.

With digital products, this practice becomes complicated. The DVD industry for example, uses region coding to prevent DVDs sold in one region of the world from being played on players sold in another region of the world. They have managed to do this by building a type of DRM into almost every DVD player sold.

This promises to be much harder to do for digital books; that's not to say that the book publishing industry isn't trying to do it. [Insert here your favorite argument against applying DRM to books]

Nonetheless, certain aspects of ebooks seem ideal for developing countries with large populations. The cost of creating an additional copy of an ebook is almost zero; and the cost of reader devices, which has finally cracked the $100 barrier in the US, continues to fall. It's not far fetched to think of generating significant revenue by selling millions of copies of popular ebooks for 10-25¢ a copy. Libraries could play an important role in helping to enable this sort of distribution securely.

One possible stumbling block to realization of this fantasy is ebook piracy. While no reproducible research has been able to show that ebook piracy is a significant issue in the US and UK markets, all the available data indicate that demand for pirated material is quite strong in developing markets. Publishers that ignore these markets because of their small economic value today are throwing away huge potential future markets by training consumers that if they want to obtain ebooks they need to avoid legitimate markets.

Bounty markets for ebooks could play an important role in monetizing large numbers of people who can only pay small amounts for ebooks. Because bounties are posted for open access release of ebooks, it doesn't work unless all global rights holders agree and get a share of the money. A credible market of this type would thus have to start out with a truly global presence. That's one reason I've just arrived in snowy London for a short visit and also a reason I'll be heading to balmy India this weekend.

Wish me luck.
Enhanced by Zemanta

Tuesday, November 23, 2010

On The Value of Things - at a Garage Sale

The proceeds
I spent this past Friday and Saturday pondering the value of all sorts of things- toys, games, books, furniture, household items, clothing. I had four colors of stickers. Green was one dollar, blue was two, and red was just fifty cents. yellow dots we priced "as marked" which meant more than two dollars. Three dollars seems to be a lot to ask - at a garage sale.

We had a beautiful day on Saturday, and people came non-stop. The timing was great- people are starting to think about the holidays, and we had lots and lots of toys. The sale was a great success for us- about 80% of the stuff we put out disappeared, which is a good thing, because otherwise we'd have to figure out another way to get rid of it all. My colored dot assignments weren't about intrinsic value; it was more about how much we wanted to get rid of a thing.

Putting a price on things also meant that people had to value them. If 50¢ was two much for someone to pay for a kitchen knife, well, that person was unlikely to provide the knife a worthy home. Of course some people felt compelled to bargain, despite the dime-on-the-dollar pricing. So I bargained a bit, and they left happy. Others apologized for the low prices; they left happy, too.

The free items were my favorites. I had a bag of shoes in the garage; originally meant for discard. A gentleman asked if they were for sale, and I said they were free to our good customers. So he tried them on, and he was so happy that they fit. They were old shoes, but Rockports do last a long time.

Another fellow had selected bunch of books including several of my father's old particle physics books. Why my dad, an electronics engineer, had particle physics books, is one story; why I shipped them from Indianapolis to store them in my basement is entirely another. But anyone that interested in particle physics deserved to get those books for free!

Dad said: "Just go to bed!"
Most of the books we were selling were ones that our kids had grown out of. They were a dollar each, half off if you bought more than 10. Chapter books were 25 cents each, though I couldn't bear to part with Mercer Mayer's "Just Go to Bed" at any price and took it off the sale shelves.

One eleven year old gleamed when she found out there were books for sale. Her mom had bought some furniture and was arranging to pick it up later. "You can look at the books when we come back" she said, herding the girl and her 8 year old cousin to the car.

The Librarian from the Black Lagoon
It was dark and the sale was long over when they returned for the furniture. I had already packed the leftover books into my basement. After helping to load the furniture into the car, I told the mom that if the kids wanted to see the left over books, I'd be happy to give them any books they wanted. So the four of us went down to my basement and looked at the books. "Oooh, I want!" said the girl at some age appropriate books. "Ooh look, Shakespeare, Mommy! I want!" The mom and I looked at each other and smiled. The mom's smiles were understandable to any parent; mine were because it was the copy of "A Midsummer Night's Dream" that I had read in high school. The cousin did not go away empty handed. I pushed some "Black Lagoon" books on him.

Reflecting on the joy I experienced in seeing kids excited to get some books, I got a better understanding of why so many librarians love what they do. Imagine if you could do the same thing for lots and lots of kids. It would be like taking that joy and multiplying it by thousands.

Maybe that's why I've been obsessed with "ungluing ebooks".

Personal Note: I believe that in life, when you discern a calling, you need to remove whatever obstacles there may be to answering that call. I hear this call to "unglue" ebooks quite personally and clearly. And also to have a "bounty market" to make it happen ready before Thanksgiving in 2011. You can consider that an announcement. For now, why not celebrate Thanksgiving by taking a book off your shelves and find the person who is meant to read it next?

Thursday, November 18, 2010

Real Research Gets Reproduced

It's not often that I'm identified as a physicist, as Richard Curtis did in his commentary on my followups of Attributor's piracy "demand" report. But it's true, I worked in materials physics research at Bell Labs from 1988-1998.

Crystal structure of YBCO
Those were great years to be in materials physics. In 1986, two guys at IBM Zurich discovered some amazing new superconducting materials. By the end of that year, a team in Japan had reproduced their results; a group I was a part of at Stanford did the same after talking with the Japan group in December. By March, so many groups around the world had made exciting discoveries that the American Physical Society meeting in New York became known as "the Woodstock of Physics".

A blue semiconductor laser.
A few years later, a guy in Japan reported that he had made a semiconductor light emitting diode (LED) glow blue. His work was a lot harder to reproduce; it took years for anyone to come close to what his team reported; although he published many details, it was hard work. I even sawed one of his LEDs in half to try to understand how it worked. Today, my kitchen (and the screen of my MacBook) is lit by white LED's made from that same semiconductor.

Around that time, some chemists in Utah announced a truly amazing discovery: they saw fusion reactions occurring in palladium electrochemical cells. Since they were respected electrochemists, their results were taken seriously, and lots of people tried to reproduce the incredible results. The promise of a seemingly magical, unlimited power source seemed almost too good to be true. This time, however, nobody could reproduce the results. Some scientists saw odd things happen, but they were different in every lab. At Bell Labs, the scientists trying to reproduce so-called "cold fusion" became convinced that the guys in Utah were being led astray by their excitement.

In science, it's usual that a surprising result will only be accepted once it has been reproduced by someone else. My scientific training has sometimes gotten me in trouble in the world of libraries and publishing. When presented with something that seems surprising to me, I ask for the evidence. In cultures that are more comfortable assigning and recognizing authority, my questions have sometimes been seen as irritants.

It's been that way with my questions about the Attributor report. I was surprised at some of the findings, and I tried to reproduce them. My results can't reproduce some of the key findings reported by Attributor. It would be nice to better understand the factor of a hundred difference between my results and those of Attributor; much might be learned from such an analysis. Attributor is a company that sells anti-piracy services; one would hope that the data they report is somehow rooted in fact, even though they benefit from overestimates of privacy.

In Richard Curtis' article, Jim Pitkow, Attributor's CEO, is quoted:
Our study’s rigorous methodology ensured highly accurate results that align with actual consumer behavior. We analyzed 89 titles, using multiple keyword permutations per title, across different days of the week, with very high bids to ensure placement – each of which is fundamental in guaranteeing accuracy and legitimacy. Each of these variables impact the findings, and analyzing all variables together produce highly accurate results. We stand by our research, and we’re confident that the study addresses an accurate portrayal of the consumer demand for pirated e-books.
If Attributor really stands by its research, it will make it easier for people like me to reproduce their results. In particular, they should publish the complete list of the "869 effective keyword terms" used as keywords for their Google AdWords experiment. There are mistakes they might have made in permuting and combining search terms; they might also have thought of a class of effective search terms that my study totally overlooked. As it stands, it's impossible to know.

I can understand why Attributor might not want to release their search term list. First of all, they should expect people to try to tear it to shreds. The marketing department isn't going to like that. That's what happened to the superconductor guys, the blue LED guy, and cold fusion guys. They stood behind their work, and let the scientific community look for weaknesses and make their own judgments.

Cold fusion didn't pan out, and Pons and Fleischmann, the Utah guys, tried for years to figure out what it was they measured. Bednorz and Müller, the guys in Zurich, won the Nobel Prize. Shuji Nakamura, the LED guy, won a Millenium Prize and a lawsuit.

It may be easier to do a followup study without the worry of spurious searches for widely known terms. But at this point, Attributor customers and the book industry as a whole stand to learn a lot from understanding where the irreproducibility of Attributor's study is coming from. Publishers need that information to plan out a response to the threat of ebook piracy, and their needs should come first- no matter what the marketing department says.
Enhanced by Zemanta

Wednesday, November 10, 2010

Infochimps and the scaling of dataset value

Image representing Infochimps as depicted in C...Image via CrunchBaseSure, a picture is worth a thousand words, but what is a thousand words worth? How about a million? If I had a dataset of the most recent trillion words spoken by humanity, (anonymized and randomized of course!) would that be worth any more than the set of words in this blog post?

These are real questions. A Texas company called Infochimps has datasets quite similar to these, ready for you to use. Some of the datasets are free, others you have to pay for. More interesting is that if you have a dataset you think other people might be interested in, or even pay for, InfoChimps will host it for you and help you find customers. (Infochimps just announced they had raised $1.2 million in its first round of institutional funding.)

One of the datasets you can get from Infochimps for free is the set of smileys used on twitter in tweets sent between March 2006 and November 2009. It's free. It tells you that the smiley ":)" was used 13,458,831 times, while ";-}" was only used 1,822 times.

If you're willing to fork over $300, you can get a 160MB file conatining a month-by-month summary of all the hashtags, URLs and smiley's used on twitter during the same period. That dataset wil tell you that during September of 2009, the hashtag #kanyeisagayfish was used 11 times while #takekanyeinstead was used 141 times.

If you're a scrabble player, you can spend $4 for a list of the 113,809 official words, with definitions. Or you can get them free, without the definitions.

courtesy of Infochimps, Inc. CC-BY-A
I had a great talk with Infochimps President and Co-Founder Flip Kromer a few weeks ago before his presentation to the New York Data Visualization Meetup. I fell in love with one of the visualizations he showed in his presentation, and he's given me permission to reproduce it here. (Creative Commons Attribution License) It's derived from the same Twitter data set you can get from Infochimps, and shows networks of characters that are found in the same tweet. So if ♠ and ♣ appear in the same tweet over and over again, the two characters will have a strong connection in the network of characters.

The character connection data was fed to a program called Cytoscape, which is an open source visualization program used in bioinformatics; Mike Bergmann has a nice article about its use for large RDF graphs. The networks are laid out using a force-directed algorithm (which is pretty much the simplest thing you can do). Coloring is applied arbitrarily.

As you might expect, the main character networks that show up are associated with languages, but there are some anomalies. For example, the katakana character ツ (tu) sticks out. Katakana is a set of phonetic characters used in Japanese for non-Japanese words. The reason "tu" is set apart from all the other katakana is that people use it on Twitter as a smiley.

The other anomalous character subnet is labeled "???" in the graph. A closer look reveals this to be the set of characters that look like upside down roman text.

Kromer has noticed that the price (or perhaps cost) of a partial data set follows a non-monotonic curve (see graphic). Small amounts of data are essentially free, but a peak value is reached when portions of the data set are extracted from the full data set. If we were discussing book metadata, for example, peak value might accrue for a set of the 100,000 top selling books.

There's much less value, according to Kromer, in having a large incomplete chunk of a data set. Data for 10,000,000 books, for example, would have less value than the 100,000 book data set, because it's not complete. Complete data sets become extremely expensive because of the logistics involved, and because of the value of having the complete set.

This pattern seems plausible to me, but I'd like to see some clearer examples. I've previously written about having too much data, but that article looked at the effect of error rates on data collection; Kromer's curve is about utility.

For me, the most interesting thing about Infochimps is the idea that the best way to make data flow in large volumes and create new types of knowledge is to provide the right incentives for data producers through the establishment of a market. This makes a lot of sense to me; however I'm not sure that the Infochimps market has also established incentives needed for data set maintenance; the world's most valuable and expensive data sets are one that change rapidly.

Kromer contrasted the Infochimps approach to that of Wolfram, whose Alpha service is produced by "putting 100 PhDs and data in a lab". He also feels that much of the work being put into the semantic web is a "crock" because its technology stack solves problems that we don't have. Humans are pretty good at extracting meaning from data, given a good visualization.

We can even recognize upside-down text.
Enhanced by Zemanta