JCDL 2007

The ACM and IEEE put on their Joint Conference on Digital Libraries this week in Vancouver, B.C. While I was not able to stay for the full conference, which looked to have a great program, I was fortunate to attend a pre-conference tutorial on Tuesday, “Thesauri and ontologies in digital libraries“, starring Dagobert Soergel from the University of Maryland. This will not be a play-by-play; most of that can be found in the “workbook” (PDF from the same workshop given at ECDL 2006) Dr. Soergel gave us. Instead, I will highlight a few points from the day.

Dr. Soergel’s premise, given at the start of the day, was that “the system should support the user in creating meaning”. Id est, after interacting with a thesaurus, the user should come away with a greater understanding of relevant concepts and their relationships to each other. Do note, however, that the user need not interact with the thesaurus directly. A UI design challenge is to develop an interface that integrates the structure and power of the thesaurus without requiring the user to navigate the vocabulary to find the preferred terms. Dr. Soergel did not have a solution to how this should be done, but did offer a few suggestions in that direction.

Different user groups (and, usually, different individuals) will have different preferred terms for the same concepts, an idea that is becoming increasingly acceptable among librarians. FRAD, for example, acknowledges that a concept or entity can have multiple preferred forms, each under a different authority, and this concept is key to the Virtual International Authority File. Dr. Soergel’s used the example of medical conditions; one term for a condition may be more useful to doctors, another to the layman. The system should be able to support both of these communities and more through the appropriate use of structured relationships among terms. A large, but not necessarily comprehensive, list of these structured relationships can be found in the above PDF (p. 191), along with a draft of how these relationships might be modeled in a relational database (p. 196).

We spent some time talking about multilingual thesauri, a topic far more complicated than I had initially realized. Translating a thesaurus into a new language does not make it multilingual unless there is a one-to-one mapping of terms used to express a concept in one language to terms used to express a concept in the other. To make this work, one must often invent terms for one of the languages to express a concept in the other. For example, German has no word for a watch (a timepiece you carry with you), even though it does have words for specific kinds of watches (e.g., Taschenuhr = pocket watch, Armbanduhr = wrist watch), so a term would have to be invented in German to match the English concept of a watch. Sometimes, different languages approach things from such different perspectives that even inventing terms will not suffice.

Despite the title, there was not much discussion of ontologies or digital libraries; I suppose “Thesauri” by itself lacks marketability. But on the topic of thesauri, the tutorial was informative and well-presented. This was my first visit to Vancouver, a lovely city that I hope to return to someday (hopefully without having to sprint through the terminals at O’Hare next time). Indiana is certainly lacking in oceans and mountains.

