Archive for the 'Libraries' Category

FRBR: the Definitive Guide

Jonathan Rochkind recently posted his paraphrase of FRBR over at Bibliographic Wilderness. It is clear, concise, and accurate. From now on, I will consider it the definitive guide to FRBR Group 1 entities. Thanks, Jonathan.

Measuring the Value of a Book

How do you measure the value of a book? One might ask several questions when determining what a book is worth: How meaningful is the content? Is it enjoyable to read? Can you learn from it? Does it have historical significance? The list can go on indefinitely, and everyone will weigh the various factors differently, depending on their reason for wanting a particular book.

I don’t want to say that any particular metric is necessarily wrong, but I find it inconceivable that people will buy books based on how thick they are, with nary a thought for aught else. But apparently this happens. How else could the Strand Book Store sell books by the foot?

Sure, they mention some legitimate uses, film and theatre sets, for example. But to say “we will custom design a library that is sure to be a perfect match for any home or office space” implies that these books have no value beyond the visual appeal of their spines. Maybe it’s just the penny-pincher in me talking, but can’t you get the same visual effect with wallpaper? Leave the books for snobs like me who thinks books are for reading.

Fun with Acrobat

In my last post, I noted the need to convert some PDFs from a format suitable for a printer to a format suitable for online reading. The PDFs of the Muncie Times that I receive are laid out as spreads for each printed sheet of paper. So, for example, the first spread of the PDF includes pages 48 and 1, the next spread includes pages 2 and 47, etc. Many of the pages also have various printer’s marks along the edge.

First, I established a general algorithm for tidying this up.

  1. Make a copy of the document in reverse order and append it to the end of the document.
  2. Crop off the unneeded half of each spread (left for the odd-numbered spreads, right for the even-numbered spreads).
  3. Delete the printer’s marks from the margins.
  4. Add top and bottom margins.

If you’ve tried to do much automation with Adobe CS applications, you’ve probably encountered the well-documented JavaScript APIs that make the job much easier. Acrobat is special. Its API is very different, much more limited, and boasts horrible, often inaccurate documentation. Even getting Acrobat to recognize and run a script can be such a chore that I’ve taken to copying code into its JavaScript console and running it from there.

That rant aside, it wasn’t too difficult to accomplish the first couple of steps. Step 1 (assuming you’ve already opened the document):

var nPages = this.numPages;
for (i = 0; i < nPages; i++) {
	this.insertPages({
		nPage: nPages-1,
		cPath: this.path,
		nStart: i
	});
}

Step 2:

for (i = 0; i < this.numPages; i++) {
	if (i % 2 == 0) {
		this.setPageBoxes({
			cBox: "Crop",
			nStart: i,
			rBox: [11.25*72, 0*72, 22.5*72, 13*72]
		});
	} else {
		this.setPageBoxes({
			cBox: "Crop",
			nStart: i,
			rBox: [0*72, 0*72, 11.25*72, 13*72]
		});
	}
}

Note that all measurements must be in picas. Since 1 inch = 72 picas, I just multiply all of my values by 72. I probably could have made this more universal by letting the script calculate the width of the page and then divide that in half.

At this point I discovered an oddity of Acrobat. In other Adobe programs (and any image-editing program I’ve ever used), when you crop something, you define and area and remove everything outside of that area. Acrobat never removes any part of a page, it merely hides it. So while you have this document that looks like it has 48 pages, each 11.25 in. x 13 in., you really have a document that has 48 pages, each 22.5 in. x 13 in., which is to say a document twice the size that it needs to be.

In my search for a fix to this, I eventually came across this handy tip from the Acrobat 7 PDF Bible (p. 388):

If you want to eliminate the excess data retained from the Crop tool, you can open the PDF in either Adobe Photoshop or Adobe Illustrator. Both programs honor the cropped regions of PDF files cropped in Acrobat. When you open a cropped page in either program, resave it as a PDF.

Very helpful information, that, if the cure weren’t worse than the disease.

  1. Neither program can open more than one page of a document at a time. But I could write another script to do this part if that were the only problem.
  2. Photoshop rasterizes all of the text. Needless to say, that’s unacceptable.
  3. Illustrator can’t use embedded fonts if you don’t have them on your system and will replace them with whatever fonts are available. Since they have a Mac and I have a PC, this won’t work.

After mulling this over a bit more, I had an epiphany: print it! “Gasp,” you say, “wasn’t the whole point of this to avoid having to go through a print version?” Yes, but printing doesn’t have to go to a physical medium. In this case, I used the Adobe PDF printer that comes with Acrobat to print my PDF to a PDF. Incredibly, this worked. By setting a paper size to 11.25 in. x 13 in., I could print the document to a new, appropriately-sized document while discarding the excess data (and doing some optimization for online viewing while I was at it). Step 2 complete.

After discovering how to accomplish step 2, I realized that steps 3 and 4 could be accomplished in a similar manner. Crop the margins off and print to a new PDF with the margins I want, clear of any printer’s marks. As a matter of fact, these steps could be rolled into step 2. Simply take off an extra 0.45 in. from each side of the page, then print to a 11.25 in. x 13.75 in. page. So the new combined code for steps 2 and 3:

for (i = 0; i < this.numPages; i++) {
	if (i % 2 == 0) {
		this.setPageBoxes({
			cBox: "Crop",
			nStart: i,
			rBox: [11.70*72, 0*72, 22.05*72, 13*72]
		});
	} else {
		this.setPageBoxes({
			cBox: "Crop",
			nStart: i,
			rBox: [0.45*72, 0*72, 10.80*72, 13*72]
		});
	}
}

After that’s run, you create your custom paper size and print to PDF, centering the content on the slightly-larger page.

Digital to Print to Digital, or, Running in Circles

Rule: Don’t add unnecessary, value-subtracting steps. If a process already has these steps in it, take them out.

Application: I’ve come to be responsible for an ongoing newspaper digitization project. Not a large project, by any means, but important for the library’s community relations. We (”we” being the Ball State University Libraries) created a digital archive of the Muncie Times, a local newspaper that is still published regularly.

Dealing with back issues was straightforward: scan and OCR. But, as I mentioned, the newspaper is still published regularly, so we get another issue every other week. Here’s the workflow I inherited:

  1. The publisher creates the issue using QuarkXPress.
  2. The publisher exports a PDF and sends it to the printer.
  3. The printer prints the issue.
  4. The publisher sends a printed copy of the issue to the library.
  5. The library scans and OCRs the issue.
  6. The library puts the issue on the Internet

If you’re like me, you look at steps 3-5 and groan at the inanity of it. These steps made sense for the back issues that no one had retained a digital version of, but there is absolutely no reason, in this 21st century, to use printed newspapers in creating a digital archive of digital objects.

Here’s the new workflow:

  1. The publisher creates the issue using QuarkXPress.
  2. The publisher exports a PDF and sends it to the printer and the library.
  3. The library puts the issue on the Internet.

It’s a miracle! Faster, easier, cheaper, and (most importantly) higher-quality, just by cutting out half the steps.

Caveat: The new step 3 isn’t quite so easy as it sounds. The first problem is getting the publisher to actually do step 2. The second problem (which I’ll cover in a bit) is converting the PDFs from a format suitable for the printer to a format suitable for online reading.

Did you mean: fluoride?

My dentist told me two noteworthy things yesterday: I need to floss more, and she misses the card catalog. I’ll leave aside my dental hygiene, it being a bit out of the scope of this blog, to focus on the latter.

She complained that the online catalog never works for her for one simple reason: she’s a horrible speller. With the card catalog, she could get to the general area and then thumb through the cards until she found what she was looking for. With an online catalog, a mistyped word gets you, “No results matched your query”, or some such. Then it’s off to the dictionary to figure out how to spell what you’re looking for. Or the user just assumes your library doesn’t have any relevant resources and goes to find the first match on Google.

There are some rather simple solutions for this that I have seen implemented. The catalog can suggest similarly spelled words when the user searches for an unknown term, much in the same way that Google or Amazon asks, “Did you mean: properly spelled word”. Or the user can land in a list of indexed terms that are nearby, alphabetically. (I’ll leave it to others to determine the optimal user interface for dealing with multiple misspelled words.)

The point is that our catalogs are failing our users, in this way among others. Someone would prefer, with good reason, to manually flip through printed cards rather than take advantage of the far greater search capabilities of the computer, because we haven’t replicated the functionality of a stack of paper. Vendors, why don’t we have these tools in place as a standard part of every catalog, of every journal database, of every digital library? It would be nice to finally offer quarter-century old technology to our users.