Lessons From a Batchload Reclamation Project: Part 2

Uploading our catalog’s export to the tech folks at Serials Solutions was uneventful. Their alert that they could see the data in the records was welcome. Their notation that there were no location holdings in them was easy to rectify: I just had to make sure that the “Export 999 field” check box in the Export Records menu of the Utility Module in Symp0hony Workflows was checked on the next go-around.

At some point however, I checked the boxes above and below it as well. The former exported the junk tags; harmless for most purposes, useful for a few. I didn’t think they were really necessary for either project, but figured better too much information than too little and I checked it.

At some point, I checked a box that was marked “Export Symphony catalog key to MARC tag 001″. Now, if you use Workflows and have checked that box yourself while exporting records, you can probably see where this is going. For everyone else, here’s the situation.

We uploaded the exported files to OCLC. They processed the data as we’d arranged in our paperwork and posted the files to their web page. You click the link, download the file, and read it back to your ILS.

Except our files didn’t reload.

I decided that I needed to take a look at the records themselves and wanted to do so within OCLC Connexion Client. I’d been trained to use Client as a primary cataloging tool by my former boss at NYAM and had trained the cataloging staff here how to use it as well. It felt familiar. In there, I felt that I could troubleshoot matters more effectively than I might otherwise.  But I didn’t want to import a ginormous file into it.

So I explained the situation to the staff at OCLC’s batchload desk and they were happy to break the files up into smaller chunks, each one with a maximum size of 9,000 files (Connexion Client has a maximum of 9,999 records per file and I wanted a buffer.)

I downloaded the new files. I saw everything that we’d sent, and everything that OCLC had done. The OCLC control numbers were indeed in their 035 fields. Our local control number tags were in their own 035 fields.  What I didn’t see anywhere were the Sirsi control tags in 035 fields.

Unfortunately, I didn’t know how important that was at the time. I have since learned that in a properly created export file, Workflows adds its own control number to a new 035 field in each record specifically so that you can match on that number when you re-import the records.

What I did know was that the records refused to load. I tried using the 020 to match on the ISBN number . . . that worked, but it also created a load of duplicated records that did no one any good at all and which were later deleted.

I tried matching on the vendor 001. No go.

I tried matching on the OCLC 035. No joy.

I tried matching on the (non-existent) Sirsi 035 a few times. Nada.

I tried matching on the 245 but that created a similar problem to the attempt to match on the 020: titles matched but duplicated records rather than replacing and updating them. So that didn’t work.

In between all these attempts to use the data I’d exported, OCLC was insisting that I was doing something wrong and Sirsi was telling me that OCLC had wrecked our data.

Finally, I arranged a phone conference with Sirsi’s senior analyst and had him log in remotely to my PC to see what he could see. The punchline was this: you remember that check box that exported the catalog key to the 001? That should not have been checked. Catalog keys cannot be used as matching points on a Bibload report in Workflows.

In other words, I had 129,000 records of garbage.

Yeah.

Sometimes the best thing to hear is that you’ve screwed up. It frees you to trash what you’ve done and start again from the beginning. So I did.

This time, I made sure that the catalog keys were not exported to the 001. I made sure the export files were no more than 9,000 records long. I made sure that each file was checked in Connexion Client before I uploaded it to OCLC’s server. I made sure the file names had the correct syntax.

On top of this, OCLC very graciously re-processed our data for free, owing to the fact that the project was still current and that the result had been a case of garbage-in-garbage-out.

This time when the files came back, they uploaded perfectly and updated our original records without problems. That done, I re-exported another set of records for Serials Solutions and their metadata people will work on that shortly.

So. The takeaway:

1. I don’t care what The Cult of Done Manifesto says, pretending you know what you’re doing is not almost as good as knowing. There is no substitute for hands-on experience. Do as much research as you want, but getting your arm stuck in the machinery and pulling a bloody stump out is an effective lesson all its own.

2. Cataloging and Systems Librarianship are like hiking and swimming: both are useful skills, but hardly interchangeable.  Catalogers don’t do systems work and the systems folks can’t catalog. Luckily there enough of us accidental systems librarians out there that we can get the requisite work completed, if not always as quickly as we’d planned for.

3. Experts are experts in their systems, not each others. Which means that . . .

4. Experts will blame each others systems for what goes wrong.

5. Ask for a favor. You might be surprised, as I was when OCLC told me they would re-run the project at no additional cost. And finally . . .

6. Fixing it makes everything that came before before seem better.

These are things to remember for future projects.

Related Posts Plugin for WordPress, Blogger...