Monday, December 9, 2013

A Scientist Needs More

I visited my high school French teacher at the end of October. She's retired and lives in Alsace. My French stays with may, mostly unused, but something else I learned from her has stayed with me- she showed me how to eat steamed artichokes. Whenever I eat and enjoy artichokes that way, my mouth remembers. So often, we give something to other people, and it it lives on, having an impact out of proportion to the original gift. A girl in my Calculus class introduced me to Bruce Springsteen and The Wild, the Innocent and the E Street Shuffle. Wherever you are, Sarah Strong, thank you, whether you remember or not!

On October 29, I lost a friend from graduate school, Kanji Yoh (陽完治), and I've been reflecting on so many things I learned from Kanji that have remained with me over the years. But what I learned has been hard for me to articulate, because it's not like artichokes or the Boss. OK, I blame Kanji for taking me to the Opera my first and only time, so I suppose I can thank him for helping me cross that off my list. But there's a lot more.

Kanji and I both started graduate school at Stanford the same year. For me graduate school was getting back on a familiar academic track towards being a scientist. For Kanji, graduate school was a step off the conventional track. He had been an engineer at Hitachi, in Japan, at the sort of job where you were expected to work for the company your entire working life. And it was reasonable to expect the company to take care of you in return. Kanji was a rare, remarkable, courageous creature, and my advisor recognized him as such. Kanji needed something more.

Valuing that "more" is what I learned from Kanji.

Kanji and I shared a tiny, sloped-ceiling office with 2 other students in the attic of Stanford's McCullough Building for four years. We were among the first batch students of a new Professor, "Coach" Jim Harris, and we were charged with building a new research lab and launch a new research program. Kanji would ask me interesting questions, and I'd startle him with odd conjectures and amazing facts.

Kanji worked on a counterintuitive idea, to engineer p-channel transistors from layers of Gallium Arsenide instead of the usual n-channel transistors. It was a sort of semiconductor jiujitsu, flipping the weak characteristics of a material into strengths, so as to balance two complementary transistors and achieve both speed and low power. It was an elegant idea; the job of a researcher is to see whether physical reality has the same beauty as the mathematical description.

My wife and I spent a memorable Thanksgiving with Kanji, his wife and some friends. We rented a condo at Lake Tahoe, went skiing, and did a real Thanksgiving turkey. Then we did some more skiing; Kanji was so eager to get out on the slopes early, and there was a foot of fresh snow.

After a post-doc at IBM Watson Labs, Kanji got a job as an Assistant Professor in Osaka. It was a difficult time for him, because the strict Japanese academic system doesn't allow much freedom for junior Faculty. I had a chance to visit him once. We walked around his neighborhood and talked about cicadas, which were making a huge noise. I told him that New Jersey cicadas, unlike Japanese cicadas, come in broods that appear in 13 or 17 year cycles. Always a prime number to fool cyclical predators. Kanji's face lit up with wonderment at the beauty of this fact. "Really?" he said.

Kanji persevered in Osaka and got a position at Hokkaido University in Sapporo. His work there covered quantum dots and wires, graphene transistors, and spin polarized tunneling. His recent work has been on "spintronics", trying to find ways to use and manipulate electron spins to do calculations with quantum mechanics. When I left semiconductor physics 15 years ago, these things were just starting to be thought about and the idea seemed so beautiful and futuristic and impossible.

Kanji played both the violin and flute and performed with three different ensembles. He was just as serious about his music as he was about his semiconductors. He became a fan of  Soprano Anna Netrebko, and would fly around the world to go to her performances. I was a bit shocked when he first told me about this, especially the part about the protocol for delivering roses to the star after a performance. But it's inspired me to take more seriously some of things I've long thought would be fun to do, but who has the time? Like visiting my high-school French teacher in Alsace and drinking Auxerrois with her friend the viticulteur.

I miss him. I'm sure Kanji's wife, a translator and prize-winning poet, and his two kids feel a great absence. They're not alone.

Saturday, December 7, 2013

Physics and Testosterone, Part 5: Missing Nobelists

Tomorrow morning at 9AM CET, the Nobel Prize Lectures in Physics will be streamed live on the internet. François Englert will speak about The BEH Mechanism and its Scalar Boson. Peter Higg's talk is entitled Evading the Goldstone Theorem. Robert Brout, the B in BEH, will not share in the Prize. Not because he didn't make a prize-worthy contribution, but because he didn't live long enough. Brout died two and a half years ago, at the age of 82. Higgs, 84, gets the name recognition because he noticed that the "mechanism" proposed by Brout and Englert implied the existence of a particle. Englert, at 81 is the youngster of the trio. 3 other physicists Tom Kibble, Gerald Guralnik and C. R. Hagen, did pretty much the same thing, but got their paper out a bit later, so they get to be "almost Nobelists".

Ironically, Brout might have won the prize if he was a woman. Women live longer than men. But it's not as if women haven't missed out on the Nobel Prize in Physics because they were women.

Two women have been awarded Nobel Prizes in Physics, Marie Curie(1903) and Maria Goeppert-Mayer (1963). 192 men are physics Nobelists, Higgs and Englert make 194.

From what I've come to understand, there is at least one case where "being a woman" is a probable explanation for a physicist's exclusion from a Physics Nobel Prize. Chien-Shiung Wu most likely deserved a share of the Nobel Prize awarded to T. D. Lee and C. N. Yang. It's also hard to explain why another physicist, Lise Meitner, didn't share the 1944 Chemistry Prize awarded to Otto Hahn.

It's not as if the landscape for prospective prizes looks very different. Of the 42 eligible "Nobel class" physicists on Thomson-Reuters "citation laureate" list  only two are women.  (not that it's the best list or anything, but it's a list, and both Lene Hau and Vera Rubin are widely respected).

Reading through the list of 83 eminent 20th century women physicists who made their contributions before 1976, it's striking to me how many of these scientists made significant contributions despite NOT BEING ALLOWED to, or being accorded second class status of some sort.

Also I was surprised to find that many of the 83 are still eligible to win a Nobel (in addition to having made major contributions, they seem to be still around!):

This is my favorite discovery from reading
about the 83 eminent physicists. It's a photo of
Helen Megaw, a crystallographer whose
work on perovskite minerals I had encountered
in my past life as a physics researcher. It was
taken on the occasion of being awarded an
honorary degree a year or two before
her death in 2002 at age 95,
but as you can see by her expression,
she totally won life.

I was unable to find information about two others on the list, Christiane Bonnelle and Janine Connes.

There may be two or three future Nobelists on this list, or there may be zero. A few of the women on the list are eminent for their contributions to society, and wouldn't be considered for a Nobel. Sometimes it takes a long time for pioneering work to be recognized as such; sometimes today's hot topic seems inconsequential or worse, wrong, 10 years later. But imagine what the list would look like without our history of discouraging women who felt the calling of physics.

In the earlier parts of this series, I've written about the culture of physics, defined by men for the personality norms of men. It shouldn't be surprising that only a minority of women have been overcoming the barriers it imposes to reach the highest levels. Looking forward, physicists need to recognize that there are more ways to nurture physicists with the diverse talents needed today. 

My own view is that physics, broadly defined as figuring out how physical things work, is a fundamental human impulse that gets expressed in many ways. Once you start, you can't stop doing it, even if you've left the profession to do it in non-physics places. It still needs to be done, and the rewards of doing it transcends prizes that stigmatize 194 men and 2 women.
Enhanced by Zemanta

Monday, December 2, 2013

Physics and Testosterone Part 4: Bang

True (but embellished) story:

Toni, a newly hired physicist at Bell Labs, decided to set up a new lab to do the most sensitive electrical measurements ever. So the young researcher spent hundreds of thousands of dollars building a metal cage with insulators and coatings to electrically isolate the lab from the rest of the world. When it was ready for testing, Toni's friends were summoned. They got inside the cage and Toni cranked up a high voltage power supply. "We're floating 50 thousand volts above the rest of the room!" Toni said, excitedly. "Wouldn't it be cool if we could get to a hundred kilovolts?" Toni's face had that evil grin that made them laugh, nervously. At sixty kilovolts, there was hardly a blip of noise on the oscilloscope, but the smell of ozone in the room was unmistakable. Toni didn't stop there; at 65 kilovolts there was a loud bang, the circuit breakers flipped, and the room exuded a smell of molten plastic. The cage was now arc-welded to the back wall; the expensive insulators instantly useless.

"Whoops." said Toni.

Here's a quiz for you- Which of the following do you think happened to Toni?

A. Toni was fired for her stupidity and for wasting Bell Labs' money. She never worked in Physics again.

B. Toni was mortified and considered quitting immediately. But months of taunting by colleagues who would occasionally make "bang" noises at her was too much. She is now a very successful High School teacher in Northern New Jersey.

C. Toni suddenly developed an interest in a different type of experiment, which had surprising results and today Toni is often mentioned as a candidate for the Nobel Prize in Physics

Enhanced by Zemanta

Friday, November 29, 2013

Physics and Testosterone Part 3: Jumping off Cliffs

Much of the education of a physicist or an engineer revolves around problem sets. To do a problem set, the student needs to understand some concepts to "set up the problem". There will be some calculating machinery that the student must have mastered. Problems can be difficult in a variety of ways. Often, the path from concept to solution won't be obvious. It might not be obvious which principles might be applied to a particular problem. The calculations required might be very messy. Insight or intuition may be required- you might have to guess the answer to figure out how to get to it. But often, solving a problem requires a leap of faith. You might have to work for a half hour before you know whether you've chosen the right approach to the problem. So it helps to have some self-confidence, to be ok with not knowing whether you're on the right path or totally lost. Other times, you just need the right tool, and maybe you have to invent the tool.

Real math or science doesn't come with answers in the back of the book. A researcher might work for years without knowing whether their efforts are leading them down a blind alley. The exquisite feeling you get when you've solved a really hard problem is why people become physicists, mathematicians, and engineers. It's the feeling of having eyes where once you couldn't see.

Having her self-confidence assaulted by every problem set in grad school was a challenge for "K", the applied physics Ph.D. I wrote about in Part 2. But there was that one problem set in high school that stumped everyone else in the class, but which she solved. Once you've tasted that success, you don't forget it.

Learning to ski was a revelation for K. You'd take the ski lift up to the top of a mountain, and somehow you'd end up at the bottom.

There are a lot of ways to get to the bottom of any slope. Some people like to do traverses. Some people go straight down. My method was to swallow the sheer terror, point my skis downhill, and power through some turns and some slides. I've never been a great skier so I'd get half way down and land on my butt. Gradually, I figured out how to avoid falling.

This is apparently a typical male's approach to skiing. A touch of reckless self confidence lends itself to this approach. Just watch some teenage boys on a ski slope if you doubt it.

K realized that she didn't have to ski like the guys. The part of skiing she enjoyed was carving turns. To carve a good turn, you have to put your weight downhill, which at first feels insecure, but in practice gives you more control. And having good technique gives you real confidence.

Realizing that she could approach problem sets her way really helped K get through those difficult problem sets. It was OK that she felt like she had no idea what to do while many of her male colleagues just pretended to know how to do them.  There was nothing wrong with focusing on skills and carving away the difficulty. And not break anything.

Tuesday, November 26, 2013

Physics and Testosterone Part 2: Study Hall

In the fall semester of sophomore and junior years, my work-study job at Princeton was to tutor freshman engineers in a study hall sponsored by the engineering school. The study hall had been created a few years before to help freshmen survive the shock of learning calculus, physics and chemistry, all at the same time.

We had a variety of students seeking help. I began to notice a distinct pattern in the sort of help that was needed by women and by men.

The typical interaction with a woman at the study hall went like this:

Woman: "I have no idea how to do this problem!'
Me: "Tell me about the problem"
Woman: [[ detailed explanation of problem ]]
Me: "How do you think you should attack the problem?"
Woman: [[ detailed plan for solving the problem ]]
Me: "Sounds good"
Woman: "Oh thanks so much, you've been so helpful!"

The typical interaction with a man at the study hall went like this:

Man: "I can't get this problem to work"
Me: "Tell me about the problem"
Man: [[ bizarre, complicated, and wrong explanation of the problem ]]
Me: [[ detailed explanation of problem that woman student just told me ]]
Man: "Oh"
Me: [[ detailed plan for solving the problem that woman student just told me ]]
Man: "Really?"
Me: "Would it kill you to try?"
Man: "but [[botched calculation]]"
Me: "Might want to check your signs"
Man: "Hey, I knew it would work!"

Of course, not every student was typical. I remember one freshman woman in particular. She would come in with a male friend. They were taking the sophomore level physics and math courses. This posed a problem for me, as I was also taking the sophomore level physics course, albeit the physics major track, rather than the engineering track. I was barely a week ahead of them. I used my "tell me about the problem" strategy, which at first seemed to satisfy them. But after a few weeks, their questions got more difficult and I was having more difficulty. So I suggested to them that the study hall wasn't really meant to help with sophomore level courses.

A few years later, I found myself a classmate of this student in grad school at Stanford; she went on to get a Ph.D. in Applied Physics. She never quite forgave me for "kicking them out of study hall".  Her story tomorrow.

Monday, November 25, 2013

Physics and Testosterone Part 1: Captain Kirk

The New York Times Magazine had a really interesting article by Eileen Pollack about women in physics last month. And the week after, the Nobel Prize was announced, and the recipients were, no surprise, men. It got me thinking about how gender and physics. I happen to know a lot of physicists, some Nobelists, and a fair number of women who are physicists. One or two of the women might win a Nobel one of these years, but the odds aren't so good.

Thinking back on my training in physics, I realized that I have some stories to tell that might shed some light on the effect of gender on the development of scientists, engineers, and technologists, and how to do better.

My sophomore year at Princeton, I took the physics-major track physics courses. For Electricity and Magnetism, we had a professor fresh out of Caltech, who we all called "Captain Kirk". As rumor had it Captain Kirk borrowed his curriculum from a graduate course at Caltech which had a track record of producing Nobel Prize winners. The textbook was completely inscrutable and the problem sets were pretty much impossible.

Looking back on it, I'm pretty sure that if the curriculum ever produced Nobel Prize winners, it wasn't because it did a good job of teaching the material. More likely it was effective because it did a terrible job of teaching the material. Which had 2 consequences:

  1. All us smart-ass physics students quickly realized that we weren't nearly as smart as we thought we were. We were unaccustomed to the fear of failure, and it motivated us.
  2. We formed groups to work on problem sets together and taught each other the material.
  3. Except for the freshman who miraculously did all the problem sets on his own.
What struck me from Pollack's article was a quote from Meg Urry, a professor of Physics and Astronomy at Yale:
“Women need more positive reinforcement, and men need more negative reinforcement. Men wildly overestimate their learning abilities, their earning abilities. Women say, ‘Oh, I’m not good, I won’t earn much, whatever you want to give me is O.K.’ ”
Maybe Captain Kirk's course was really designed to discourage us. Filled with testosterone or conditioned by society, the guys among us were stupidly overestimating our capabilities and we needed to be brought down to earth. We had all been solo stars in high school, and we needed to to be forced to work with our peers. We needed to be broken down so that we'd be more open to new ideas.

Probably the one woman in our study-group didn't need to learn those lessons. More positive reinforcement could have helped her more. (She ended up getting the physics degree just fine and went to med school.)

Despite Captain Kirk's hopes, no one from the class has won a Nobel Prize, yet.

Monday, November 18, 2013

Google Books and Black-Box Copyright Jurisprudence

Last week, eight years after the first lawsuit was filed to stop the Google Books Project, Judge Denny Chin finally ruled on the core merits of the case. The decision is being widely hailed on one side as a "tremendous victory for fair use" and on the other side as a "fundamental challenge to copyright". But these are short-term perspectives. I think that the long term impact of the decision may turn on the acceptance of Chin's approach to technology's transformation of copyright, which I would characterize as Black-Box Jurisprudence.

In my view, the core holdings about fair use were never in much doubt. The argument saying that indexing or lexical analysis or data-mining of books always requires the permission of a rights holder was never very defensible, or even seriously argued. A holding that display of snippets was not fair use would have made scholarly writing in the digital age impossible; a decision the other way on snippets would have been swimming up a judicial stream. But fair use is always a weighing of factors, and the untold story in the Google Books case is about the factors that didn't get weighed.

The reason that Google got sued in the first place was less about "what Google did" than about "how Google did it".  Google made huge numbers of copies of books without permission of the rights holders. Judge Chin's ruling said, effectively, that all those copies were incidental to the fair use.
[I]f there is no liability for copyright infringement on the libraries' part, there can be no liability on Google's part.
In the end, it didn't matter how Google did what it did. In Judge Chin's analysis, copyright is concerned only with the ends, not the means. Copyright seems not to be concerned with what happens inside the black box.

Chin is not alone in this approach. His opinion follow's Judge Baer's ruling in the Hathitrust case, which featured a ringing endorsement of the Library's fair use
I cannot imagine a definition of fair use that would not encompass the transformative uses made by Defendants' [Mass Digitization Project] and would require that I terminate this invaluable contribution to the progress of science and cultivation of the arts that at the same time effectuates the ideals espoused by the [Americans with Disabilities Act].
But for, me, the surprise in Baer's opinion was his transformation of the Arriba Soft case into a broad license for infringement. In that case, display of thumbnail images by a search engine was held to be fair use, and the copying of the images in the course of producing thumbnails was held to be necessary for the protected use. Judge Baer wrote that the fact that the images were on websites available for anyone anywhere to download was not relevant to the analysis, which he then applied to Google's scanning and OCR of physical books.
Although Plaintiffs assert that the decisions in Perfect 10 and Arriba Soft are distinguishable because in those cases the works were already available on the internet, Aug. 6, 2012 Tr. 19:2–4, I fail to see why that is a difference that makes a difference. As with Plaintiffs’ attempt to bar the availability of fair use as a defense at all, this argument relies heavily on the incorrect assumption that the scale of Defendants’ copying automatically renders it unlawful.
Baer thus reduces and equates Google's million dollar scanning operation with Arriba Soft's one line of code because they're in a fair use black box.

The Black Box approach to copyright can cut both ways. In Chin's dissenting opinion in the Aereo case, he wrote that it didn't matter that Aereo had engineered a way to use completely legal technical means to stream television signals over the internet.
In my view, by transmitting (or retransmitting) copyrighted programming to
the public without authorization, Aereo is engaging in copyright infringement in clear violation of the Copyright Act. [...] The system employs thousands of individual dime-sized antennas, but there is no technologically sound reason to use a multitude of tiny individual antennas rather than one central antenna; indeed, the system is a Rube Goldberg-like contrivance, over-engineered in an attempt to avoid the reach of the Copyright Act and to take advantage of a perceived loophole in the law.
In the Aereo case, Chin argued that since the end result of Aereo's engineering was a system with copyright infringing intent, the under-the-hood details of Aereo's system were not compelling. (Read James Grimmelmann for more on this case and copyright arbitrage in general.)

So when presented with cases where copyright law and technology collide, Chin has more or less adopted a consistent approach that isn't inherently pro-copyright or pro-fair-use.

If Chin's ruling had focused on the infringing means (i.e. massive copying) rather than on the fair-use ends in the Google Books case, Google could have gone back to the drawing board to devise a non-infringing means to accomplish the same ends. It would have been more expensive (à la Aereo), but the plain fact is that ten engineers can run technical circles around a thousand lawyers. In the end, Google would have lost the battle but would be far ahead in the war.

As the case now stands, while Google has a free hand to go back and improve and expand its scanning operations, it is still constrained in what it can deliver. For example, since Chin's decision cites the lack of advertising on snippet result pages in his fair use analysis, Google can't put advertising there without risking another $100 million lawsuit. Another innovator in the space can't go and do things differently without worrying about another judge's fair-use analysis.

The advantages of a black box legal approach is its practicality. Judges don't have to understand the intricacies of technology in order to decide legal questions. Technical processes are opaque for business reasons, too. But perhaps more importantly, a black-box approach to copyright law means that engineers can't use clever hacks to get around copyright.

The danger of the black box is that it pretends that technology doesn't matter, that code isn't law. Copyright law is rooted in technology, that of the printing press, and turning it into an abstraction that can also govern digital media wile ignoring what goes on behind the curtain is a dubious project. A complex enterprise like Google Books is a long journey from inception to delivery. Imagine if highway safety was addressed by regulating total travel times. Does it make sense to regulate a new technology like airplane travel in the same way?

Perhaps there ought to be a fifth factor in fair use analyses of systems more complex than a printing press. In addition to the usual four factors, Judges could also be weighing whether the steps involved in accomplishing a fair use would stand under their own 4 factor analysis. In the Google Books case, the analysis could have incorporated a weighing of the scanning operation by itself. Similarly, Aereo's meticulous adherence to legal means could weigh in favor of a fair-use determination.

My worry is that in other situations, perhaps with technologies we haven't imagined yet, the black box legal approach will end up with very wrong technical results. And then we'll be stuck, waiting for Congress to fix things. Look at what's happening as digital surveillance collides with crypto-security. There, the courts have uniformly refused to look inside the black box of the NSA, and the results may end up being disastrous.

(Gary Price has a thorough opinion round-up at Infodocket.)

Enhanced by Zemanta

Friday, November 15, 2013

Blogifying a Book

On Flatland the Blog, I'm turning a book into a blog. It's Flatland.

The ostensible reason is to promote our test campaign of Unglue.it's Buy-to-Unglue campaigns. It'll be a while before it's ready to launch for real, but I've found that there's no substitute for having real users try things out. One ungluer managed to find 2 different bugs within three minutes, partly by virtue of the 'ě' [LATIN SMALL LETTER E WITH CARON] in his username. Another user was the first to ever try changing their e-mail address to something invalid while having a username containing '@'. For some reason our unit tests didn't foresee these possibilities.

But really, I've been fascinated by the possibilities of the read-write book. We now have lots of ways to save and share annotations, but in most cases, this is done as a networked overlay on top of user-immutable texts. The annotation layers in Readmill, or in Kindle, live in their respective network.

Another effort to spread annotations over the web is Hypothes.is, which is trying to use standards to break the annotation layer out of closed networks.

I don't think there's anything wrong with networked annotation layers, but there's another technical direction that's been largely unexplored. What if a user's annotations are stored in the digital file that packages the ebook? This has the effect of restoring the individuality to copies of a book. The annotations could then be shared by sharing the file, the same way that pencilled annotations in a printed book might be shared privately. An anti-facebook, if you will, for an era when everything in the network layer is sure to be scanned by the NSA. And it also changes the dynamic of sharing a file in a library.

So what I want to do is collect comments and put them into the Flatland ebook that we're producing. I spent a fair amount of time producing a clean, attractive EPUB file from public domain scans by Google and Project Gutenberg, but I'd like to do more. The idea of turning the book into a blog occurred to me because Flatland's chapters were the right length, and, well they're curious, and need comment in the modern context.

So read along with me on Flatland the Blog, leave comments and suggestions, and at the end maybe something interesting will come out of the experiment!
Enhanced by Zemanta

Tuesday, October 29, 2013

eBook Heaven

Do you believe in heaven?

Well, why not? There are many ways to think about heaven. Most people will admit that there's something of us that lives on after we die, even if it's just the memories that we leave in others or the impact of our lives on the material world. And whatever that something is, it doesn't need food to eat or even air to breathe. It certainly doesn't require a paycheck. Truth and beauty and wisdom, those qualities don't really die with our bodies, do they? If the bit that we leave behind has of its essence some truth or beauty or wisdom, doesn't that sound like heaven?

If you have a favorite library, you know what book heaven is like. Words can live on long after their creators have turned to dust. Libraries work each and every day to bring all the truth, beauty and wisdom in their collections to their communities, both present and future. They cooperate with each other, so that even if your library is missing the book you need, another will fill the void. The rules governing our society have recognized how important all this is, and allow us all to benefit from the labors of those whose existence has faded to memories.

I believe in ebook heaven. In ebook heaven, there are no royalties to pay to Herman Melville or William Shakespeare or Dante Alighieri.  There's even a slushpile in ebook heaven, where the weight of the world presses a diamond or two from unpublished graphene sheets.

The ebook heaven I believe in – some call it Open Access.

There's ebook hell, too, and that's what libraries live today. In ebook hell, books don't live forever, they disappear after a year. Or they're snatched into the kindles of eternal damnation by digital rights demons, lawyers and engineers. Every read must be monetized to feed some hungry monstrosity, and truth and beauty and wisdom are memories like the smells of leather bindings and musty paper.

Or maybe it's ebook purgatory. Dante imagined purgatory as a mountain that souls must climb before being admitted into paradise. In each circle around the mountain, the deadly sins that have stained the souls – envy, greed, lust, etc. – are purged by suffering, sanctified by fire and purified by agony. At last, the remains enter into the Garden of Eden, where everything has returned to its original perfection.


As we look to the future of ebooks, all we can see today is a long circle of purgatory. Our copyright theology posits that we must track millions, perhaps hundreds of millions of creators and their deaths into the far future so that we may say if a work has passed into ebook heaven. In more circles around Purgatorio mountain, we must track national boundaries, regional rights, governing laws, inheritance claims, contract disputes, international conventions, and perhaps even patent rights.

But still, I believe in ebook heaven.

Libraries are still endeavor to create little circles of ebook paradise. Within the bubble of a library, ebooks can be free to read. Digital archivists see that some books really do outlast us. New generations of minds encounter all sorts of new knowledge and enlightenment.

Libraries still work with each other to connect their bubbles and make their ebook paradises bigger. We need to enlarge those heavens, book by book, year by year, library by library. And we can't restrict ebook paradise to academia, any more than a belief system can restrict spiritual paradise to its priesthood. We need to find ways to expand the boundaries of availability for every book, to build bridges between today's best sellers and the far future of the public domain.

eBook heaven is worth working towards, together.

Do you believe in it?

Enhanced by Zemanta

Monday, October 7, 2013

NYLSLR: The eBook Copyright Page is Broken

Somehow it slipped my mind that my article "The eBook Copyright Page is Broken" was published in the New York Law School Law Review in April. And I am still not a lawyer! Here's the meat of the article:
The traditional copyright statement is thoroughly and fundamentally broken. Consider the simplest possible case of a single copyright holder:
                         © Eric S. Hellman, 2013. All Rights Reserved. 
This is broken in the following ways:
  1. Since there currently are not any copyright formalities, the copyright symbol means nothing. The work is subject to copyright with or without the copyright symbol.
  2. The work may also not be subject to copyright, for example, if Eric S. Hellman is a government employee, a robot, or a non-creative compiler of factual information. In these cases there is no copyright even if there is a copyright symbol present. There is no legal duty for a publisher to put a copyright symbol only on a copyrightable work. How is the ebook user supposed to know the true copyright status of a digital work? 
  3. “Eric S. Hellman” is an uncommon name. But suppose the author is named ”John Smith.” What use, then, is the copyright statement? It does not specify which Eric S. Hellman or which John Smith is the author.
  4. The asserted name of the copyright holder can’t be relied on because text in a digital file can be altered without a trace. It’s simple to take a digital copy of Merchants of Culture and change its asserted copyright holder to “John Smith,” then redistribute it. This is a negligible problem in the print world.
  5. The asserted date of publication may be unrelated to the date of the underlying copyright. For purposes of copyright (for example, when a work is produced as a work-for-hire), re-publication of a book does not change the copyright expiration date of the underlying text.
  6. There is no specification of the work being copyrighted. In print there’s not much ambiguity, but digital books are composite objects (text and graphics are always separate entities in a digital book file) and are frequently distributed in pieces. Some ebooks even have front matter distributed as a pdf file completely separate from the chapters. In other cases, an ebook may be displayed on a website that has a separate set of copyright statements.
  7. If the digital book is legally on your ebook reader, then, somehow, the rights holder has granted you some rights, perhaps under the terms of an explicit license or with the license implicit in its availability on a website. Either way, “all rights” have not been reserved. Licenses are not needed for printed books, but they may be needed for ebooks.
In February, I wrote about ebook front matter and back matter and there's more work to be done in this vein.

The last footnote deserves some glossing. In it, I assert that the ccREL submission for marking Creative Commons status of web pages is currently in conflict with the EPUB 3 standard for ebooks. While that's technically true, it's a bit misleading. A better way to say it is that developments in HTML5 and EPUB3 have made ccREL's approach archaic. The metadata machinery in EPUB3 and HTML5 is fully up to the task of expressing and applying Creative Commons licenses. What's lacking is consensus around which of the available mechanisms to use. Since the RDFa vs. Microdata in HTML5 controversy has not yet fully shaken out, you can't really follow ccREL as written, so we'll need to have some patience.
Enhanced by Zemanta

Wednesday, October 2, 2013

Internet Plumbing: Mixed Redirect Chains

There's a lot of plumbing that underlies a website like unglue.it. Some of the more complicated plumbing involves the connections and links to other sites. Unglue.it has connection plumbing for Google, Goodreads, Twitter, Amazon, LibraryThing, Readmill, Internet Archive, Facebook, MailChimp and Gravitar;  we've worked on some more that have yet to see daylight.

The rest of this post is about plumbing surprises. If you're not interested in website plumbing, feel free to go watch a cat video.

I've written more than you want to read about redirection, a rather important bit of website plumbing. HTTP redirects enable things like link shortening (e.g. bit.ly), long term link maintenance (e.g. crossref.org and purl.org), and just-in-time linking (e.g. OpenURL). If you redirect to another redirector, you have what's known as a redirect chain.

You can easily imagine the kind of mischief that can go on with redirects, the redirect loop being the most obvious. The plumbing in your web software has to know how to avoid getting stuck in redirect loops or endless redirect chains, and for the most part it does.

Security issues can also arise with redirects, especially with mixed redirect chains. A mixed redirect chain is one that includes both secure (HTTPS) and non-secure (HTTP) links. Here's an example trace for a shortened ebook download link on the unglue.it website (it's the latest unglued ebook, Feeding the City, about the amazing human "plumbing" that delivers lunches to workers in Mumbai). You can try it yourself: https://bit.ly/19Ncaz7

The first thing your web browser does is it sets up a secure connection to bit.ly. While doing this it checks bit.ly's X.509 certificate with bit.ly's OCSP responder, Digicert. OCSP stands for "Online Certificate Status Protocol", and the result is that you can be reasonably sure that your connection is to bit.ly and that no one but maybe the NSA can snoop on your communication with bit.ly. In particular, no one can see what link you ask to be resolved, and no one but you can see bit.ly's answer.

ask bit.ly:
(verify bit.ly, at http://ocsp.digicert.com/ ) https://bit.ly/19Ncaz7 GET /19Ncaz7 HTTP/1.1 Host: bit.ly
 bit.ly's answer:
HTTP/1.1 301 Moved Location: http://unglue.it/download_ebook/986/
In this example, bit.ly is redirecting to a non-secure URL, making the redirect mixed. Anyone between you and the destination can see what you're asking for if you follow the redirect. If you're in a Starbucks using wifi, Starbucks could conceivably send you a book about coffee instead. So the secure rigamarole you went through with bit.ly seems a bit wasted. But at least no one can see your bit.ly cookie and find out all the shortened links you've followed.

ask unglue.it
http://unglue.it/download_ebook/986/ GET /download_ebook/986/ HTTP/1.1 Host: unglue.it
unglue.it's answer:
HTTP/1.1 302 FOUND Location: https://archive.org/download/Feeding_the_City/9781909254039_Feeding_the_City.epub
Unglue.it still wants your ebook download to be secure, so it sends you to a secure archive where the file can be found. ebooks are increasingly containing Javascript and you really don't want to give bad guys the opportunity to insert malicious scripts in your ebook, even if most of today's reading platforms won't execute the scripts.

Since it's a different website, your web software needs to verify archive.org with their OCSP responder, GoDaddy.

ask archive.org:
(verify archive.org, at http://ocsp.godaddy.com/ ) https://archive.org/download/Feeding_the_City/9781909254039_Feeding_the_City.epub GET /download/Feeding_the_City/9781909254039_Feeding_the_City.epub HTTP/1.1 Host: archive.org
archive.org's answer:
HTTP/1.1 302 Moved Temporarily Location: https://ia801008.us.archive.org/4/items/Feeding_the_City/9781909254039_Feeding_the_City.epub
The Internet Archive operates jillions of servers, and to save it the trouble of rebuilding its index whenever they move a file, they use a redirector to get you to the server where your ebook is living today. It's yet another server, so you have to check its certificate, too:

ask ia801008.us.archive.org:
(verify ia801008.us.archive.org at http://ocsp.godaddy.com/ ) https://ia801008.us.archive.org/4/items/Feeding_the_City/9781909254039_Feeding_the_City.epub GET /4/items/Feeding_the_City/9781909254039_Feeding_the_City.epub HTTP/1.1 Host: ia801008.us.archive.org
ia801008.us.archive.org's answer
HTTP/1.1 200 OK
And so we get our ebook. Since it comes on a secure connection, we can be sure it's the one that Internet Archive meant to give us. Since there was an insecure link in the redirect chain, we can't also be sure that it's the one that bit.ly meant to send us to.

You can see that there are a lot of steps in this chain. At every step of the way, your web plumbing needs to decide whether it's ok to send things like cookies or referers along with the request. For example, it should never be sending cookies received from a secure site to the insecure version of the same site. If a mixed redirect chain delivers you a javascript, you shouldn't mark a web page as secure even if the web page is securely delivered and it uses only https links to retrieve the javascripts.

A great example of how to implement this plumbing is the Requests module for Python. (Also a great example of clear, readable source code and documentation!)

An example of a buggy implementation of this plumbing is the open-uri code in Ruby. From the source code:

# This test is intended to forbid a redirection from http://... to# file:///etc/passwd, file:///dev/zero, etc.  CVE-2011-1521# https to http redirect is also forbidden intentionally.# It avoids sending secure cookie or referer by non-secure HTTP protocol.# (RFC 2109 4.3.1, RFC 2965 3.3, RFC 2616 15.1.3)# However this is ad hoc.  It should be extensible/configurable.
At least this code errs on the side of security. If you use Ruby to try downloading something via a mixed redirect chain, open-uri will raise an exception labeled "redirection forbidden". Perhaps it would be more accurate to label this a "too dicey for Ruby" exception.

You might argue that mixed redirect chains should not be allowed. Or at least that https-to-http redirects should be forbidden. There are two main faults with this:

  1. When when a links span multiple site, there's no practical way to ensure that your links don't get mixed. Even if they're not mixed now, that could change in the future.
  2. If you forbid https to http redirects, you're preventing sites from migrating to a more secure stance. A secure bit.ly would be impossible.

I tripped over the Ruby issue when implementing a connection to a partner that has built its site with Rails. They couldn't download some of our ebooks. Working together, we figured out what was wrong and implemented a work-around.

That's what us plumbers do for kicks.

Notes:

  1. One thing you CAN'T do is redirect https to http if your certificate expires. To fix broken security, you need to fix the security.


Enhanced by Zemanta

Thursday, September 19, 2013

Booksmash's Lust-O-Meter Shows How Innovation Happens

When HarperCollins decided to sponsor a hacking competition called BookSmash, they probably expected the participants to be a rag-tag collection of smart students, hungry young startups, and underemployed misfit coders. It's very unlikely that they expected Nobel Prize winners or seasoned tech entrepreneurs to show up. But, as I pointed out in June, they had made some interesting and fun resources available as part of the competition: 196 full-text books from some popular authors. I'll let you in on a secret: despite what you may hear elsewhere, it's fun, more than anything else, that drives innovation.

The results of the competition were unveiled yesterday. Some of the teams I was already familiar with: I met the BookCities, Coverlist and LibraryAtlas teams at Publishing Hackathon. ReadUp, from the the great folks at ReadSocial, is a neat idea definitely worth checking out. But Text Textures was the submission that popped out at me. The Text Textures team is Mira and Frank Wilczek, a father-daughter team. Frank is a Nobel Prize winning physicist, Mira is a ethical-coding serial tech entrepreneur. (Lyric Semiconductor and Red Panda Security. A new project is BookGobble.)

Text Textures starts out by imagining how fun it would be if you could just skip to the "juicy parts" of a book. It turns out that with access to the full text of a book, a pretty simple combination of weighted word counts supplemented with pacing heuristics allows a text analysis engine to measure things like lustiness (hence the "Lust-O-Meter"), affection, violence and occult themes. By graphing each of these attributes versus page number, it's easy to see where the "juicy bits" of a book are. But that's not where the fun ends. You can density-plot one attribute versus another. And so we find out that "the lustiest scenes in For A Few Demons More appear to have almost no affection". You can plot compare multiple books, and use the measures to decide what sort of book to read next.

I asked Mira about the genesis of Text Textures. She responded:
I've always been neural-net-curious. So when I found myself with a nice nest egg and some free time, I took the opportunity to round out my education. My dad (Frank) has conveniently also been curious about neural nets -- although he was more intrigued by the analogy to human cognition -- so we decided to work through Hinton's Machine Learning lectures on Coursera together. We've been doing fun technical projects together for as long as I can remember. When I was seven, we built a foot-stomping robot using Lego MindStorms. When I was sixteen, we used genetic algorithms to solve N queens.
As we went through the Hinton course, we started to think about real-world problems it might be interesting to tackle using some of those mathematical tools. Eventually we started playing with tracking characters through Sherlock Holmes .... then finding the action scenes where those characters appear ... then looking at other ways to classify scenes ... and thus the underlying idea of Text Textures was born.

The Lust-O-Meter in Text Textures is a fun toy. Which is to say that I would like to be able to play with it myself. I would build a snark-o-meter.  I'm not sure if a "Skip to Good Bits" button is something people want in the reader applications, and even if they wanted it they might not admit it. But eBooks don't have inherent page numbers, so new ways to navigate ebooks would be really useful. It's rather a shame that today's prevailing ebook environment of walled-garden DRM-encumbered marketplaces is hostile to innovations such as Text Textures. Even libraries are prohibited from doing textual analysis of most of the ebooks they buy. And because lustiness data, for example, is not protectable by copyright, rightsholders such as HarperCollins typically deploy restrictive terms of use on anyone they allow to access the full text of their works. It's not enough to open up just a crack for a hacking competition.

Everyone should be able to have fun with their books.

Note: you can vote for Text Textures or any of the other BookSmash submissions until September 27 at 5:00pm EDT by going here.
Update: @skyberrys notes that the Illuminate entry also has roots in #pubhack. I note that it's yet another contribution to the book world by a physicist!


Enhanced by Zemanta

Friday, September 6, 2013

I Hired a Book

I have an article up at Library Journal about startups that have been getting hired for reading ecosystem jobs over the past three years. The startups that I profile are GoodReads, Wattpad, Readmill, SIPX and Zolabooks.

This view of hiring companies for jobs comes from Clayton Christensen's concept of Milkshake Marketing, explained here.

Christensen describes the case of a fast food restaurant that wanted to improve sales of its milkshakes, but really didn't understand the job that consumers were hiring it for. Customer observations revealed that almost half of the milkshakes sold were to early morning customers who had long boring commutes; the milkshake was being hired to relieve boredom and postpone hunger.

Libraries too need to think about the jobs their users are hiring them to do. Sometimes it's just to relieve boredom or escape bad weather.

silo schematics
by Jerry Yeti
Over Labor Day weekend, I hired a book for a very specific job. I went to an actual bookstore and bought a book to occupy myself during a cross country flight from LAX to EWR, with a connection I was sure to miss in DFW. I chose Hugh Howey's Wool. Given the book's publishing history, you might think it perverse that I bought the print version. Wool was, for a long while, an ebook-only self-published series; Howey did a precedent-setting deal with Simon & Schuster for the print rights only. But consider the job I was hiring it to do: why would I buy an ebook that I couldn't read during the long waits on the runways? (My son had dibs on the window seat.)

Having spent the outbound trip coding new features for unglue.it, I needed a mental break, and Wool enveloped me in a completely absorbing self-contained world that did the job I hired it for quite nicely. Wool tells the story of the inhabitants of an isolated post-apocalyptic silo, built and supplied with technology from the year 2012. "IT" is the villain.

There was one continual distraction for me. My engineer's brain couldn't stop calculating the dimensions of the silo. The narrative made the silo seem large- after all it's the whole world for its inhabitants. But of necessity, it has to be compact. But how compact? Towards the end of the fifth part, I get some measurements: the bottom 8 floors are flooded, a depth of "70 to 80 feet". So 10 feet per level. That's pretty cramped!

Wool seems so relevant to our current environment, especially today's revelations about the extent of the NSA's decryption effort. Books have a way of doing jobs other than what you hired them for, just like the best employees. Just like libraries. Think about it.

Update 9/9: I purchased the DRM-free ebook package for Shift, the second book in the series. Which so far reveals that the silo tech is supposed to be from 2050 and that the top level is 10 meters, not feet.
Enhanced by Zemanta

Tuesday, August 13, 2013

The Inaugural LibraryReads List, With e-Lending Annotations

This morning the inaugural LibraryReads list was announced. However,  a number of the selected books may not be available in digital form in your library.

Fangirl
by Rainbow Rowell
Published: 9/10/2013
by St. Martin’s Griffin
ISBN: 9781250030955
X Macmillan does not do e-lending of the St. Martin's Griffin imprint. However, the Director of Library Marketing at Macmillan says "stay tuned as we continue to roll out new titles for e-lending."

How the Light Gets In: A Chief Inspector Gamache Novel
by Louise Penny
Published: 8/27/2013 by Minotaur Books
ISBN: 9780312655471
X Macmillan has some of the books from its Minotaur imprint in its e-lending pilot. But not this one. Again, "stay tuned". Apparently the audiobook is available for pre-order on Overdrive.



Night Film: A Novel
by Marisha Pessl
Published: 8/20/2013 by Random House
ISBN: 9781400067886
Random House has a strong e-lending program, but the books are expensive! The ebook pre-order is currently available on Overdrive for $84; it's $12.99 on Kindle Store.


Help for the Haunted: A Novel
by John Searles
Published: 9/17/2013 by William Morrow
ISBN: 9780060779634
✓ HarperCollins allows e-lending. The ebooks expire after the 26th lend, but they're priced at a discount from retail print.


The Returned
by Jason Mott
Published: 8/27/2013 by Harlequin MIRA
ISBN: 9780778315339
✓ Harlequin has a good library e-lending presence. The library ebook is available for $21 on Overdrive. It's $9.46 on the Kindle Store.


Burial Rites: A Novel
by Hannah Kent
Published: 9/10/2013 by Little, Brown
ISBN: 9780316243919
✓ Little, Brown is part of Hachette Book Group. Hachette recently announced that its full list would be available for library e-lending. The program is comparable to Random House's.


Margot: A Novel
by Jillian Cantor
Published: 9/3/2013 by Riverhead
ISBN: 9781594486432
? Riverhead is part of Penguin, (now part of Random Penguin House). I'm not sure what the e-lending status of this will be.


Songs of Willow Frost: A Novel
by Jamie Ford
Published: 9/10/2013 by Ballantine Books
ISBN: 9780345522023
✓ Another Random House title, should be available for e-lending.


Five Days at Memorial: Life and Death in a Storm-Ravaged Hospital
by Sheri Fink
Published: 9/10/2013 by Crown
ISBN: 9780307718969
✓ Yet another Random House title, should be available for e-lending.

A House in the Sky: A Memoir
by Amanda Lindhout & Sara Corbett
Published: 9/10/2013 by Scribner
ISBN: 9781451645606
X Simon and Schuster is at this point in time the least e-friendly to libraries of the big 6 publishers. This title should be available as part of a pilot with New York City public libraries, but if you live anywhere else you are screwed.

It seems to me that if the librarians participating in LibraryReads really want to promote reading in libraries, then they should push to have any selected books available for e-lending, and not just in New York City. Just three years ago, fully half this list would have been digitally forbidden to libraries; just because some advances have been made doesn't mean the struggle for library survival is over. Not even close.

The covers are linked to Amazon. So there! Updated with some real pricing/availability info.

Monday, August 12, 2013

A Rational Framework for Library eBook Licensing

Since the Redigi decision made it clear that there is no right of first sale for digital content in the US, it's been much easier to think up realistic doomsday scenarios for public libraries in the US. Why should a publisher let a public library lend an ebook if Amazon or some other competitor were to offer much better terms? How would our public library system, saddled with difficult-to-use systems and unfavorable contracts, ever hope to compete?

Back when HarperCollins first announced that it would only let libraries lend their ebooks 26 times before they would expire, there was widespread outrage from the library community. Looking back on that, it seems pretty clear that a lack of consultation and poor customer communication fueled the furor. By itself, the lending limit could have terrible long-term consequences for libraries, but as part of a wider, well-thought out framework, it could be useful component.

I've been doing a lot of thinking about this over the last 3 years, and I've decided it's time to float a comprehensive proposal for how libraries and publishers might work together on ebook distribution to benefit the entire reading ecosystem. eBook lending as implemented to date has been founded on a combination of irrational fears and outmoded processes. We deserve better.

Behind this framework is a set of assumptions.
  1. Library ebook distribution must sustain and increase the total population of readers; this is a prerequisite for a healthy book publishing industry.
  2. Patron discovery of ebooks in libraries must connect effectively to ebook sales.
  3. Library distribution must become much more efficient, and overhead must become much smaller for ebooks than it is today for print books and ebooks.
  4. Long term preservation of ebook availability must be a joint undertaking of libraries and publishers.
  5. The economic models used for library ebook distribution must provide incentives for libraries and publishers to promote points 1-4.
I don't pretend that people won't disagree with some or all of these 5 assumptions, but if any of them are false, then, I think there will be NO distribution of ebooks through libraries. I also recognize that not all books are alike; even if library distribution works for some ebooks, it's unlikely that it will work for every ebook.

So the fifth assumption is what this post is really about. Given 1-4, what should an economic framework look like? Here are the features of a model that makes sense to me:
  1. Decoupled pricing. An ebook license that allows for lending makes the ebook more valuable, so why shouldn't it cost more than an individual, non-transferable license? I can't say whether Random House's 300% markup for libraries is excessive, but why not let the marketplace decide? For new, super-popular ebooks, maybe 500% markup makes sense. On the other hand, maybe ebooks that need exposure should have an 80% markdown because libraries might turn them into bestsellers.
  2. Rate limits instead of DRM. Patron license embedding.  I've written about this before. This may take the most convincing, but in thinking about the imperatives of effective discovery, low distribution overhead, and long-term preservation, I've concluded that there are no alternatives to major change in library distribution technology.
  3. Circulation charges after an initial period. Most books are bought in the first year of publication. Today, libraries "deaccession" books to match their declining demand. But there's no reason for a library to deaccession an ebook, so for most books the global supply for any given ebook will eventually exceed global demand. If the library can cut its transaction cost from ~$2 per circulation to $0.20 per circulation it seems fair to reward the publisher with part of the difference for developing books with long term value. 
  4. License transferability/InterLibrary Loan. Libraries rely on interlibrary loan to expand the scope of their collections and meet special needs. But ebook loans can be instantaneous, so digital ILL can compete directly with backlist sales. If the transaction costs (currently ~$10) for ILL can be squeezed down to $1 or so, there's plenty of margin to provide a transaction payment to the rights holder for the privilege of doing so. 
  5. Patron-funded purchases. Libraries are tight on funding even as they need to completely transform what they do. Their biggest asset is a huge reservoir of public goodwill. At this pivotal juncture, their ebook offerings are characterized by long hold queues. Why can't a library patron buy an extra copy for the library and jump to the front of the queue? Why don't publishers offer "Buy for your Library" buttons on their catalog pages? The reasons are complex, but it's mostly a case of "we haven't done that before". But if it doesn't happen I just can't fathom how library discovery can effectively plug into publisher commerce.
  6. License durability. If libraries are expected to "buy" ebooks, it should be pretty much for keeps. If the publisher for some reason has to revoke a license without cause, the library should get a refund of the license price.
  7. Archival copies. Libraries need to do a lot of things with books other than lending. Indexing and archiving are good examples. The saddest thing about the most successful library ebook distributors today is that libraries don't get access to unencrypted ebook files. If libraries are to offer effective discovery and archiving of ebooks, they need access to the files. Seems a no-brainer to me.
There are a bunch of parameters to plug into this framework; here's my guess as to what they should be:
  • Rate limits: One authenticated user per two weeks.
  • Circulation fee: $0 for the first year, after the first year, 2% of purchase price or $1 whichever is greater. 
  • ILL fee (publisher share): 5% of purchase price or $2, whichever is greater. 

A rational ebook lending framework would mean big changes for both the book publishing industry and the library industry. Even if a HarperCollins decided today that this was an attractive way forward, it would be hard-pressed to find a way to implement it, because libraries just don't work that way. So it seems a bit far-fetched at this point. Based on the iBookstore fiasco, it appears to be illegal for big publishers to even talk to each other, let alone drive business model changes. It's good that a library group is still trying to figure it out.

Maybe some small startup company could try some sort of pilot program.

Saturday, August 3, 2013

Wattpad Usage is in the Ballpark of US Public Libraries

Billion Reasons Why, a novel
by xXdemolitionloverXx
on Wattpad
While working on another article, I came across this bit of data. Wattpad, the reading and writing community that's sort of a YouTube for stories, claims that its users are spending 3.5 billion minutes per month on the site. That's a number so big that I had no context for it.

So I wondered, how many minutes per month do people spend in their public libraries? There's a lot of data available for US public libraries from IMLS. In 2010, the most recent year for which data is available, 1.57 billion visits were made to US public libraries, or about 131 million visits per month. I have no idea how long an average library visit lasts, but let's say it's a half hour, then the total minutes of "user engagement" by US public libraries would be about 3.9 billion minutes per month. Roughly the same as Wattpad.

Maybe we should also count the time that readers spend at home with a library book, 30 minutes might be a serious underestimate. (see update) Also, Wattpad's usage is spread out internationally- they are the top mobile site in the Phillipines, for example. So its usage within the US is probably quite a bit less than public libraries. But it's also concentrated in certain demographics- teenage girls, for example. And it continues to grow at a solid pace.

Update: Karen Coyle point out in comments that you could also estimate library user engagement by looking at circulations. By that measure, assuming an average of 4 hours of reading per book, you get that US public libraries are about 8 Wattpads of engagement.

Any way you look at it, that's a lot of reading going on.

Tuesday, July 30, 2013

Proposal: The Dated Creative Commons License

Back on June 15, Peter Suber's book Open Access itself went open access, one year after its initial publication. You can get the ebook for free from MIT Press, but because of the Creative Commons license you can also get it from Internet Archive, and Unglue.it has a page to help you download it. It seems appropriate for this book to be its own publishing experiment, and from what I hear, the book has done well, in addition to doing good.

The "embargoed" or "delayed" model for open-access is tried and true in the scholarly journal business, and arguments about the appropriate length and propriety of embargoes are entrenched. In medical research, funding agencies such as NIH  demand embargoes of no longer than 12 months,  while humanities publishers argue that they need longer embargoes. Recently, the American Historical Association recommended that doctoral students be allowed to embargo their dissertations for up to six years. (Suber's book discusses delayed open access and embargoes in chapter 8, Casualties.)

Delayed open access for books, by contrast, is almost nonexistent. For ebooks, it would seem that an exclusive selling period followed by Creative Commons licensing could unlock a lot of value for society, and not just for scholarly works. Most books do most of their sales in the first year of publication and not much after that. The current duration of copyright, typically more than a hundred years, seems disproportionate in comparison. Used book stores capture some of the residual value of print books without profit to the rights holder, and libraries help to preserve another chunk of value. The lack of first-sale rights for ebooks leaves huge doubts about the viability of these channels for ebooks.

MIT Press accomplished the delayed open access with a promise on the copyright page of the ebook. Readers could rely on the integrity and prestige of MIT Press to make good on that promise. Wouldn't it be nice if doing something similar was easy to do for any sort of work, just like attaching a copyright date? Suppose I wanted exclusive rights to this blog post for five years, it would be nice if I could just write "(CC BY 2018)" with a url to provide the legal code.

I don't think I can really do that easily, today. Without some sort of license language, nothing would prevent me from changing my mind, so my prospective license offer would not be reliable. Today's Creative Commons licenses depend on conveyance of the license and assume immediate effect.  In the publishing world, companies go bankrupt or get acquired all the time. If Elsevier had acquired MIT Press in May, a purchaser of the book in April would have no assurance that the book would really go Open Access in June. This is not such an issue with journals because they're continuing publications.

Although I'm not a lawyer or anything, I've taken a first stab at language for applying a future date to a Creative Commons license. I've used Docracy to make the document public so that anyone can make modifications to improve it. (If you do a lot of contracts and you haven't seen Docracy, I suggest you go check it out!)  Or maybe other people have worked on this and can contribute some better language.

The beauty of Creative Commons is that it gives creators more options for distributing their works in partnership with users. A robust way of granting future CC licenses will allow more creators to vote with their works for mitigation of over-long terms-of-copyright.

Update: a quick comment from Timothy Vollmer points to a thread on [cc-licenses] that's very relevant, including some interesting discussion of the mysteriously named "Founder's License".

Update 2: James Grimmelmann, a real law professor, suggests via Twitter that "It is not out of the question that one could unilaterally enter into a binding future license at present." and points to the language he used on a 2005 Yale Law Journal Note:
Copyright © 2005 by The Yale Law Journal Company, Inc. For classroom use information, see http://www.yalelawjournal.org/about.asp. After June 1, 2006, this Note is licensed under the Creative Commons Attribution 2.0 License, http://creativecommons.org/licenses/by/2.0/. Any use under this license must carry the notation “First published in The Yale Law Journal, Vol. 114, pp. 1719-58.”
This suggests that maybe I'm making things too complicated, which wouldn't be the first time. But I wish Creative Commons or someone would just tell us what to do!

Update August 9: I've written more about what we want to do with Dated CC at Unglue.it.