Friday, July 31, 2009

Ignition Timing for Semantic Web Library Automation Engines

Last weekend, I had a chance to learn how to drive a 1915 Model-T Ford. It's not hard, but a Model-T driver needs to know a bit more about his engine and drivetrain than the driver of a modern automobile. There is a clutch pedal that puts the engine into low gear when you press it- high gear is when the pedal is up and neutral is somewhere in between. The brake is sort of a stop gear, and you need to make sure the clutch is in neutral before you step on the brake. The third pedal is reverse.

There are a lot more engine controls than on a modern car. In addition to the throttle and a choke, there is another lever that controls the ignition timing. A modern Model-T driver doesn't have to worry much about the timing once the engine has started, because modern fuel has much higher octane than fuel had in 1915. I would not have understood this except that I recently got a new car whose manual says you should use only premium fuel, and so I did some wikipedia research to find out what octane had to do with automobile engines. But I could have lived blissfully in ignorance. Believe it or not, I have opened the hood of my new car only once since I got it in December.

It occurs to me that in many ways, the library automation industry is still in the Model-T era, particularly in regards to the relationship of the technology to its managers. Libraries still need to keep a few code mechanics on staff, and the librarians who use library automation to deliver services still need to know a lot more about their data engines than I know about my automobile engine. The industry as a whole is trying to evaluate changes roughly analogous to the automobile industry switching to diesel engines.

I've been reading Martha Yee's paper entitled "Can Bibliographic Data Be Put Directly Onto the Semantic Web?" and Karen Coyle's commentary on this paper. I greatly admire Martha Yee's courage to say, essentially, "I don't understand this as well as I need to, here are some questions I would really appreciate help with". When I worked at Bell Labs, I noticed that the people who asked questions like that were the people who had won or would later win Nobel prizes. Karen has done a great job with Martha's queries, but also expresses a fair amount of uncertainty.

I was going to launch into a few posts to help fill in some gaps, but I find that I have difficulty knowing which things are important to explain. Somehow I don't think that Model-T drivers really needed to know about the relationship between octane and ignition timing, for example. But I think that people running trucking companies need to know some of the differences between Diesel engines and internal combustion engines as they built their trucking fleets, just as community leaders like Martha Yee and Karen Coyle probably need to know the important differences between RDF tuple-stores and relational databases. But the more I think about it, the less I'm sure about which of the differences are the important ones for people looking to apply them in libraries.

Another article I've been reading has been Greg Boutin's article "Linked Data, a Brand with Big Problems and no Brand Management", which suggests that the technical community that is pushing RDF Linked Data has not been doing a good job of articulating the benefits of RDF and Linked Data principles in a way that potential customers can understand clearly and consistently.

Engineers tend to have a different sort of knowledge gap. I have a very good friend who designs the advanced fuel injectors. He is able to do this because he has specialized so that he knows everything there is to know about fuel injectors. He doesn't need to know anything about radial tires or airbag inflators or headlamps. But to make his business work, he needs to be able to articulate to potential customers the benefits of his injectors in the context of the entire engine and engine application. Whether the technology Linked Data or fuel injectors, that can be really difficult.

My first guess was that it would be most useful for librarians to understand how indexing and searching are almost the same thing, and that indexing done quite differently in RDF tuple-stores and in relational databases. But on second thought, that's more like telling the trucking company that diesel engines don't need spark plugs. It's good to know, but the higher-level fact that diesels burn less fuel is a lot more relevant? Isn't it more important to know that an RDF tuple-store trades off performance for flexibility? How do you ask the right questions to ask, when you don't know where to start? We find ourselves working across many disciplines each of which are more and more specialized, and we need more communications magic to make everything work together.

I'll try to do some gap-filling next week.

1 comment:

  1. Eric, I do hope you jump into the Yee/Coyle dialog. I would especially like to hear your description of indexing and linked data. My gut feeling is that we move into a very different paradigm when we go from the linear, text-based catalog to data expressed in RDF, but at this point I can't imagine what the "new world order" looks like in practice.

    I'll try to keep current with linking between the two blogs.