Lessons From a Batchload Reclamation Project: Part 2

Uploading our catalog’s export to the tech folks at Serials Solutions was uneventful. Their alert that they could see the data in the records was welcome. Their notation that there were no location holdings in them was easy to rectify: I just had to make sure that the “Export 999 field” check box in the Export Records menu of the Utility Module in Symp0hony Workflows was checked on the next go-around.

At some point however, I checked the boxes above and below it as well. The former exported the junk tags; harmless for most purposes, useful for a few. I didn’t think they were really necessary for either project, but figured better too much information than too little and I checked it.

At some point, I checked a box that was marked “Export Symphony catalog key to MARC tag 001″. Now, if you use Workflows and have checked that box yourself while exporting records, you can probably see where this is going. For everyone else, here’s the situation.

We uploaded the exported files to OCLC. They processed the data as we’d arranged in our paperwork and posted the files to their web page. You click the link, download the file, and read it back to your ILS.

Except our files didn’t reload.

I decided that I needed to take a look at the records themselves and wanted to do so within OCLC Connexion Client. I’d been trained to use Client as a primary cataloging tool by my former boss at NYAM and had trained the cataloging staff here how to use it as well. It felt familiar. In there, I felt that I could troubleshoot matters more effectively than I might otherwise.  But I didn’t want to import a ginormous file into it.

So I explained the situation to the staff at OCLC’s batchload desk and they were happy to break the files up into smaller chunks, each one with a maximum size of 9,000 files (Connexion Client has a maximum of 9,999 records per file and I wanted a buffer.)

I downloaded the new files. I saw everything that we’d sent, and everything that OCLC had done. The OCLC control numbers were indeed in their 035 fields. Our local control number tags were in their own 035 fields.  What I didn’t see anywhere were the Sirsi control tags in 035 fields.

Unfortunately, I didn’t know how important that was at the time. I have since learned that in a properly created export file, Workflows adds its own control number to a new 035 field in each record specifically so that you can match on that number when you re-import the records.

What I did know was that the records refused to load. I tried using the 020 to match on the ISBN number . . . that worked, but it also created a load of duplicated records that did no one any good at all and which were later deleted.

I tried matching on the vendor 001. No go.

I tried matching on the OCLC 035. No joy.

I tried matching on the (non-existent) Sirsi 035 a few times. Nada.

I tried matching on the 245 but that created a similar problem to the attempt to match on the 020: titles matched but duplicated records rather than replacing and updating them. So that didn’t work.

In between all these attempts to use the data I’d exported, OCLC was insisting that I was doing something wrong and Sirsi was telling me that OCLC had wrecked our data.

Finally, I arranged a phone conference with Sirsi’s senior analyst and had him log in remotely to my PC to see what he could see. The punchline was this: you remember that check box that exported the catalog key to the 001? That should not have been checked. Catalog keys cannot be used as matching points on a Bibload report in Workflows.

In other words, I had 129,000 records of garbage.

Yeah.

Sometimes the best thing to hear is that you’ve screwed up. It frees you to trash what you’ve done and start again from the beginning. So I did.

This time, I made sure that the catalog keys were not exported to the 001. I made sure the export files were no more than 9,000 records long. I made sure that each file was checked in Connexion Client before I uploaded it to OCLC’s server. I made sure the file names had the correct syntax.

On top of this, OCLC very graciously re-processed our data for free, owing to the fact that the project was still current and that the result had been a case of garbage-in-garbage-out.

This time when the files came back, they uploaded perfectly and updated our original records without problems. That done, I re-exported another set of records for Serials Solutions and their metadata people will work on that shortly.

So. The takeaway:

1. I don’t care what The Cult of Done Manifesto says, pretending you know what you’re doing is not almost as good as knowing. There is no substitute for hands-on experience. Do as much research as you want, but getting your arm stuck in the machinery and pulling a bloody stump out is an effective lesson all its own.

2. Cataloging and Systems Librarianship are like hiking and swimming: both are useful skills, but hardly interchangeable.  Catalogers don’t do systems work and the systems folks can’t catalog. Luckily there enough of us accidental systems librarians out there that we can get the requisite work completed, if not always as quickly as we’d planned for.

3. Experts are experts in their systems, not each others. Which means that . . .

4. Experts will blame each others systems for what goes wrong.

5. Ask for a favor. You might be surprised, as I was when OCLC told me they would re-run the project at no additional cost. And finally . . .

6. Fixing it makes everything that came before before seem better.

These are things to remember for future projects.

Lessons From a Batchload Reclamation Project: Part 1

Our Batchload Reclamation Project is over. It was interesting. Truthfully, this was the first project of 2013 to kick my ass. You might call this a failure of knowledge. I have decided to call it a case of professional development.

For those of you who are not Tech Service Librarians:  Batchload Reclamation is the name that the system folks at OCLC give to a project whereby a member library exports their MARC records to their servers. Then OCLC matches the information in those records to the holdings in their databases, strips out the weird shit, dupes, and incomplete material, then sends it back to the library in question. The new records get uploaded into that library’s ILS and the result is a cleaner catalog that can be more effectively searched in WorldCat.

Our catalog, not to put too fine a point on it, was a mess. Fragmentary records, old records, thousands of records that had never been synced with OCLC’s holdings. Our big weeding project from last year helped identity a number of the inconsistencies between our shelves and the online catalog, but OCLC had no records that matched the fixes we implemented. So in the same way that looking at distant stars through a telescope means looking at the light they radiated millions of years ago, any libraries looking into our holdings via WorldCat or FirstSearch would have seen a catalog that was years out of date.

Additionally, we are in the process of implementing Summon, Serials Solutions’ discovery platform, which showed real promise for expanding the currency of our holdings and being able to demonstrate the value of such things to our students and faculty. Demonstrating these things to our administration will take more work (they think we can do the same thing with Moodle. They are wrong. But, baby steps.)

Anyway, Summon also requires current and accurate holdings and catalog records, so we needed to clean up the catalog. Besides that, we wanted to take advantage of the fact that OCLC will do a one-time batch load reclamation project for any library that is a member and has not done such a project since 2005. It’s a win-win project. Or so it sounded when I described the process to my co-workers.

Getting there in practice was another story entirely.

The problem:  Summon uses local control numbers as primary access points for scanning MARC records. Specifically, they use the 001 field as a repository for their own tracking data. The problem for us was that our ILS used those same 001 fields as the primary tracking field for our own use.

The Proposed Solution: Move the contents of our 001 fields over to something more easily accessible–namely, the 035 field–and allow Serials Solutions to populate the 001 fields of our uploaded records with their own data. We retain the ability to track our records on the way and and Serials Solutions can track everything with their data once the scan is complete.  Win-win, right? Of course, right.

The first hint of something having gone wrong was that my first export to the Summon server went . . . strangely. First, there was the size of the file: 67MB.I use Filezilla as an FTP client on my PC, but the network in my library isn’t as robust as I’d like. Data transfers stutter along, and something frequently gets caught hanging for long enough that the receiving server decides that the connection has been lost and restarts the the transfer from the beginning. With a giant file, this is problematic. Not a huge problem in terms of lost sanity, but stressful nonetheless.

Ultimately, however, the transfer got done. Summon’s implementation team’s news: we see your records. But . . . there’s no item location or object type information in them. What happened?

What happened is that I had never set up a proper export in Symphony Workflows before and didn’t know exactly what I was doing. Resolving to do better, I called SirsiSynix’s tech support crowd and got one of their reps to walk me through the procedures. I took notes and everything.

So when I had to export the files for OCLC’s project, I thought I knew what I was doing. And I did. Sort of.

In Part 2 of this story–which I’ll post on Monday–I’ll let you know what happened.

 

And Now, a Batch Reclamation Project

There are days when being a Tech Services Librarian has less to do with working the Reference or Circulation desks, and more with being a mechanic. This is one of those days.

We’re in the middle of implementing Serials Solutions’ Summon discovery service, which they describe as “a digital front door to the library’s resources.”

One of the road blocks to making this happen however, is the state of our MARC holdings. The records were not in great shape when I got here in 2007. Some of the cataloging was done in brief records by paraprofessionals who were not trained as catalogers. So a certain portion of our collection is composed of fragmentary records. This is not a problem as far as it goes. Fragmentary records are expanded or replaced whenever I find them.  The problem with fragmentary data is the lack of discovery by our patrons, since those records lack some vital access points fully developed records include.

Additionally, our ILS creates new records on demand and populates each one with a randomly generated OCM number in the 001 field. Again, not a problem per se except that I’d originally intended for the 001 field to be utilized by Summon to identify our records. Having spoken to  the implementation professionals at Serials Solutions, I’ve learned that is not going to happen.

So, new plan: populate each of our 125,000 or so MARC records with an OCLC control number to live in the 035 field. This would require a Batch Reclamation Project where we’d upload our records to the wizards at OCLC. They’d make the changes, ask if there are any records we’d like remove from their holdings, then send them back to us to swap into our ILS. The good news is that because it’s a one-time deal and we’ve never pursued this type of project before, they’d do it for us for free. (An ongoing project would cost us some money.)

The bad news is that I have a day to learn about batch reclamation projects.

That means paging through the OCLC Batch Services User Guide to absorb whatever I can and apply it to this particular project. Besides that, I’ve already taken a hard look at documents describing the ins and outs of OCLC control numbers, and the Order Checklist for Bibliographic Batchloads. It’s both dull and fascinating, but I have to think like a digital mechanic to make it work.

Related Posts Plugin for WordPress, Blogger...