Linking Library Data: A Panel Presentation

Last night the New York Technical Services Librarians (NYTSL) held its panel presentation at the New York Society Library. If you weren’t there, you missed a fascinating evening in a gorgeous space (with impeccable catering).

The speakers were Cristina Pattuelli, Associate Professor, School of Information and Library Science, Pratt Institute; Ingrid Richter, Head of Systems & First Ledger Project Coordinator, New York Society Library; and Trevor Thornton Senior Applications Developer, Archives, NYPL Labs, New York Public Library.

Mark Bartlett, Society Head Librarian, made a few opening remarks about the history of the institution: the founding of the library in 1754 as a private repository which was open to members only. The Society Library’s membership has included names like John Jay, Herman Melville and Willa Cather. (Their website gives a fuller description of the institution’s history.)

Ingrid  Richter spoke about the New York Society Library’s First Charging and Early Borrower Ledger project.

The starting point for Ingrid’ project was a wealth of original material dating from 1789-1792 that provided some amazing information about the books that luminaries such as Aaron Burr, George Washington, and John Jay checked out while in New York. As the only materials used in the project involved raw data transcribed in the original ledgers, the main goals included creating images which were user friendly and promoting knowledge of the charging ledgers.

Step one involved converting TIF images of the pages. Automated batch commands in Photoshop created thumbnails which were then converted to JPGs.

Step two was the creation of spreadsheets, using Excel. A team of four librarians converted the ledgers into the sheets in question to track data locations.

Step 3 involved creating database in File Maker Pro, imported all spreadsheets into raw data over 2k entries. This database allowed a better, more comprehensive type of reporting than spreadsheets.

Step 4 involved tracking people. What were people doing? Ledgered information allowed the tracking of birth and death dates (for example) and the two databases were linked together by linker web addresses together. The result was a count of borrowing records, counts of checkouts, borrowing dates, etc. Finally, a database of book titles was created to define what happened to book information. Each book had its own database.

Web pages were created to link back to raw data about books. A pages database was visualized as a finding aid for people who might want to read the ledger page by page. HTML sounded simple enough but they finally decided that static web pages were preferable as the metadata needed to be reliably locatable. A bulk reading utility made everything more convenient.

Check out the full exhibit on the library’s website.

Also, check back tomorrow for the rest of the presentation summary.


“Just a Bunch of Library Talk”

So you think we should be archiving all students' CAs* every semester. Hmmm.

Do you have any idea what you mean when you call this idea of yours an archive? Do you? Really? Really? Because I think we’re talking past each other here.

Okay, back to the beginning: I like the idea. There isn’t a week that goes by when a student doesn’t come to the library and asks to see a sample of a CA. They get this forlorn look on their faces when I tell them that no, we don’t have copies of past CAs on hand to show them. And they could probably use the assistance that such a resource would offer—everybody likes to have an idea of what’s expected of them, especially students. Queens College library school, for example keeps a few examples of similar projects on hand in a box in the library school's main office. It's not much more than a cardboard box with maybe three dozen papers lying in it under the secretary'd sesk but they do have it. If that was what I thought you were sugegsting I'd be on board all the way.

But what you seem to be suggesting—that we archive every CA produced by every student every semester—is impossible. I don’t like using that word—impossible—but this time I think I have to, and here’s why.

First let’s define what you want. What you’re describing is not really an archive, it’s more like a depository. Depositories are great for storing vast quantities of material but they’re not exactly an accessible, searchable, medium. It’s not like you’re scanning each paper—yes, I know you said you wanted all this stuff to be scanned, but we’re not there yet. I’m getting to all that.

   A depository is like a bank vault. It stores absolutely everything in a designated collection. The only records you generally create indicate who owns the stash, what the stash contains, and what manner of access is required. That’s the simplest form of this exercise. In this case, I'd think the easiest way to go about it is to take the hard copy produced by each student and catalog it by the student's name, by the subject head appropriate to the paper itself, and index according to class number and semester. Each student would get their own file devoted to their work. Each new paper each student produces during their time here would be placed in their folder. That's maybe nine hundred papers per semster to be filed.

Every item needs to be indexed according to whatever search medium you want to utilize. Beyond that, everything has to be sufficiently labeled and only a qualified archivist should have ready access to it–that means closed stacks. If a layperson mishandles an item–replaces it in the first convenient shelf hole rather than where it was taken from, then it's gone forever. No one will ever find it again because no one will know where to look for it.

For that matter, where and how will this stuff be stored? If you’re talking about archiving the output of an entire school—even a relatively small one—then you’re talking about a thousand or so papers each semester. An average document size of fifty pages means roughly fifty thousand of pages every semester. That’s a lot of volumes. Think of a phone book, then think about how you’d house fifty or sixty of them. Each year, you need to find space for another fifty or sixty. Yes, I’m serious. That's the appropriate magnitude and scope of what you're describing.

It gets worse. If you’re going to make the catalog searchable online, each item needs its own MARC record as does the collection itself. You want to look at the items online? Great, that means scanning the pages. If you’re going to scan it, then that adds a level of complexity to your project. Heck, that's a full time job in itself. You need a scanner, someone who knows how to run one then each page converted and joined into a PDF. That electronic file then has to live inside a server somewhere or be scanned onto some form of permanent media like a CD-ROM or DVD—which may not be in use in say twenty years. Now that it’s a computer file you may need some additional metadata to define it so that a library computer can see it much less deliver the file online. Then there’s the—

Okay, I’m going to overlook that. I know this stuff is unappetizing but dismissing it as “just a bunch of library talk” isn’t going to help. You want to make a collection available and I want to help. I just need you to meet me half way.

Like how? Well, you need a full time archivist. A pro whose job entails sifting through mountains ofl discarded, disorganized, dissolving manuscripts, and coverting them into useable data. A librarian, ever a Tech Services liberarna, really can't do that. I'm not trained for it. You need a pro, and she needs to be on staff all the time. I mean this is an ongoing project, right? Every year, more CAs to archive. That’s a lot of work. A lot of man-hours and a lot of money changing hands.

Finally, you need to figure out whose documents will go into this thing. Everyone’s? A representative sample from each class? A representative sample from each department, or maybe from the school as a whole? Transfer of copyright waivers are a must if you don’t want to get sued at some point for theft of intellectual property. How are you going to document the transfer of material? How about permission forms? Opt-in forms? Opt-out forms? Actually, I take that back—this is the first thing you should be thinking about, not the last.

If this sounds like a gigantic hassle, well, maybe you should have asked what the project entailed before you asked the Dean to fund it. If you want to do this properly, I’ll be happy to discuss it with you. If not, I will wish you luck and wave bye-bye.

You’re welcome.


*CA = Constructive Action, which is the big research paper our students produce at the end of each semester.

Webbys, Google, and The Ultimate Computer

First of all, a major round of congratulations to which has been nominated for a 2007 Webby Award (in two categories)! [APPLAUSE] They’ve made it to the finals, in fact, and the RS wishes them luck in nabbing that sucker.  For what my opinion is worth (exactly what you have paid for it), I’ve been in love with this organization since they first appeared. They truly are impartial in the dirt they uncover both from left and right wings of the political spectrum, and their analysis is consistently thorough and well-researched. For that alone they deserve to win.  If you feel compelled to help them out a bit towards said winning, go vote for them here.  If not, well, I have another question for you.

Clearly, Google now believes that it can catalog books for the Library of Congress.  Well, maybe it can.  That’s unfair–of course they canShould they, though?  That’s a different question.

[Read more...]

Related Posts Plugin for WordPress, Blogger...