JCDL 2007

The ACM and IEEE put on their Joint Conference on Digital Libraries this week in Vancouver, B.C. While I was not able to stay for the full conference, which looked to have a great program, I was fortunate to attend a pre-conference tutorial on Tuesday, “Thesauri and ontologies in digital libraries“, starring Dagobert Soergel from the University of Maryland. This will not be a play-by-play; most of that can be found in the “workbook” (PDF from the same workshop given at ECDL 2006) Dr. soergel gave us. Instead, I will highlight a few points from the day.

The ACM and IEEE put on their Joint Conference on Digital Libraries this week in Vancouver, B.C. While I was not able to stay for the full conference, which looked to have a great program, I was fortunate to attend a pre-conference tutorial on Tuesday, “Thesauri and ontologies in digital libraries“, starring Dagobert Soergel from the University of Maryland. This will not be a play-by-play; most of that can be found in the “workbook” (PDF from the same workshop given at ECDL 2006) Dr. Soergel gave us. Instead, I will highlight a few points from the day.

Dr. Soergel’s premise, given at the start of the day, was that “the system should support the user in creating meaning”. Id est, after interacting with a thesaurus, the user should come away with a greater understanding of relevant concepts and their relationships to each other. Do note, however, that the user need not interact with the thesaurus directly. A UI design challenge is to develop an interface that integrates the structure and power of the thesaurus without requiring the user to navigate the vocabulary to find the preferred terms. Dr. Soergel did not have a solution to how this should be done, but did offer a few suggestions in that direction.

Different user groups (and, usually, different individuals) will have different preferred terms for the same concepts, an idea that is becoming increasingly acceptable among librarians. FRAD, for example, acknowledges that a concept or entity can have multiple preferred forms, each under a different authority, and this concept is key to the Virtual International Authority File. Dr. Soergel’s used the example of medical conditions; one term for a condition may be more useful to doctors, another to the layman. The system should be able to support both of these communities and more through the appropriate use of structured relationships among terms. A large, but not necessarily comprehensive, list of these structured relationships can be found in the above PDF (p. 191), along with a draft of how these relationships might be modeled in a relational database (p. 196).

We spent some time talking about multilingual thesauri, a topic far more complicated than I had initially realized. Translating a thesaurus into a new language does not make it multilingual unless there is a one-to-one mapping of terms used to express a concept in one language to terms used to express a concept in the other. To make this work, one must often invent terms for one of the languages to express a concept in the other. For example, German has no word for a watch (a timepiece you carry with you), even though it does have words for specific kinds of watches (e.g., Taschenuhr = pocket watch, Armbanduhr = wrist watch), so a term would have to be invented in German to match the English concept of a watch. Sometimes, different languages approach things from such different perspectives that even inventing terms will not suffice.

Despite the title, there was not much discussion of ontologies or digital libraries; I suppose “Thesauri” by itself lacks marketability. But on the topic of thesauri, the tutorial was informative and well-presented. This was my first visit to Vancouver, a lovely city that I hope to return to someday (hopefully without having to sprint through the terminals at O’Hare next time). Indiana is certainly lacking in oceans and mountains.

OVGTSL 2007 – Part 4 – RDA

The final part of the conference focused on RDA. I think Dr. Tillett is the third member of the JSC I’ve heard speak on RDA. Every time I hear one of them, I’m very encouraged that things are moving in the right direction, albeit haltingly.

The final part of the conference focused on RDA. I think Dr. Tillett is the third member of the JSC I’ve heard speak on RDA. Every time I hear one of them, I’m very encouraged that things are moving in the right direction, albeit haltingly.

RDA is much more principle-based than previous cataloging rules, which should serve us well. Unfortunately, one of the principles seems to be “Don’t scare the library administrators”. It is this that keeps RDA from being the revolutionary change if probably needs to be. By insisting on near-complete backwards compatibility, the JSC seems to be trying to say “Keep cataloging exactly the same way you always have, but here are your new reasons for doing it that way”.

But, as I said, progress is being made. Over time it should become clear which cataloging practices are not based on the principles enshrined in RDA (and, by extension, in FRBR). Perhaps future revisions of RDA will slowly weed these out. The decision to unwed RDA from ISBD and MARC is definitely good news. And there will be fewer instances of disparate pieces of information being combined into one unparsable data element (Hooray, they’re not calling them metadata elements!). This and more should make RDA a very useful content standard for non-MARC cataloging.

OVGTSL 2007 – Part 3 – Virtual International Authority File

Discussion of FRAD leads right into a project that includes the Library of Congress, OCLC, and several other institutions around the world to develop a Virtual International Authority File (VIAF). The first, proof-of-concept stage of the project involves experiments in combining the personal name authority files of the Library of Congress and Die Deutsche Bibliothek. The ultimate goal is to enable authority control on a global scale by matching and linking authority records from all the national libraries.

Discussion of FRAD leads right into a project that includes the Library of Congress, OCLC, and several other institutions around the world to develop a Virtual International Authority File (VIAF). The first, proof-of-concept stage of the project involves experiments in combining the personal name authority files of the Library of Congress and Die Deutsche Bibliothek. The ultimate goal is to enable authority control on a global scale by matching and linking authority records from all the national libraries.

Dr. Tillett mentioned several other projects with this same goal that have failed or not gone far enough. They all ran into obstacles matching the records reliably and consistently. What VIAF has going for it that these other projects did not is , basically, better matching algorithms and access to OCLC’s bibliographic database. This gives them an error rate of less than 1%.

As a side note, Dr. Tillett mentioned that the Library of Congress will have Unicode capabilities in their authority file by December or January.

OVGTSL 2007 – Part 2 – FRAD

After lunch, Dr. Tillett moved on to the work of the FRANAR (Functional Requirements and Numbering of Authority Records) Working Group. This group recently released a draft (PDF) of FRAD (Functional Requirements for Authority Data) for public review. FRAD covers records for “Group 2 Entities”, as defined in FRBR. These entities are Persons, Corporate Bodies, and the recently added Families (this last at the request of the archival community; many librarians would rather include Families as a subset of Corporate Bodies).

After lunch, Dr. Tillett moved on to the work of the FRANAR (Functional Requirements and Numbering of Authority Records) Working Group. This group recently released a draft (PDF) of FRAD (Functional Requirements for Authority Data) for public review. FRAD covers records for “Group 2 Entities”, as defined in FRBR. These entities are Persons, Corporate Bodies, and the recently added Families (this last at the request of the archival community; many librarians would rather include Families as a subset of Corporate Bodies).

One notable development is the de-emphasis on authorized or preferred access points. Under the FRAD model, it looks like controlled access points can be developed according to different sets of rules, and one can indicate whether those rules designate an access point to be a preferred or variant form of a name. Thus, an authority record could have multiple “preferred” access points, perhaps in different scripts or languages, and it would be up to the system to select which to display to the user.

Note that the working group’s name also mentions numbering. One of their tasks was “to study the feasibility of an International Standard Authority Data Number”, basically a unique ID to be assigned by a central international body to every authority record created by any institution in the world, a URI for authority records. The working group recommended against the formation of such a body, citing costs and impracticality as the leading reasons. They did, though, recommend the use of system control numbers from maintainers of authority files (e.g., the Library of Congress, DDB) be used as identifiers.

OVGTSL 2007 – Part 1 – FRBR

Dr. Tillett’s presentations began with an overview of IFLA and its activities, then moved on to FRBR (Functional Requirements for Bibliographic Records) for the rest of the morning. This was a fairly basic introduction to the FRBR model, intended for the majority of the librarians there who only had, at best, a passing familiarity with it. She mentioned that the FRBR group, while developing a conceptual model rather than an actual implementation, did want to encourage the adoption of FRBR for library systems. For this reason, they focused on laying everything out in E-R diagrams, presuming this would make it more comfortable for systems designers who would inevitably be charged with implementing it. She points to the Library of Congress’s MARC and FRBR page for an analysis of using FRBR in an MARC environment.

Dr. Tillett’s presentations began with an overview of IFLA and its activities, then moved on to FRBR (Functional Requirements for Bibliographic Records) for the rest of the morning. This was a fairly basic introduction to the FRBR model, intended for the majority of the librarians there who only had, at best, a passing familiarity with it. She mentioned that the FRBR group, while developing a conceptual model rather than an actual implementation, did want to encourage the adoption of FRBR for library systems. For this reason, they focused on laying everything out in E-R diagrams, presuming this would make it more comfortable for systems designers who would inevitably be charged with implementing it. She points to the Library of Congress’s MARC and FRBR page for an analysis of using FRBR in an MARC environment.

Dr. Tillett freely admits that FRBR is not incredibly relevant for about 80% of the library catalog, that being the items that have only ever existed in one form and one edition. But using FRBR can greatly improve access to the remaining 20%. And it’s only reasonable to assume that these works that have appeared in multiple forms, multiple editions, etc., are more likely to be used anyway. It’s their popularity that led to these numerous instantiations.