Wednesday, September 23, 2009

MARC Records into Greenstone

This was a test of getting MARC record imported in Greenstone digital library software and automatically filling in the Dublin Core metadata elements

The first step was to get the MARC records into a file. There are three ways to do this:

- save some records from a library OPAC
- find a file of marc records on the Internet
- use a Z39.50 client

We tested the first two methods. The library OPAC was straightforward, you do a search and tag or select the records you want and then view them. There are several options such as ProCite and MARC format, so we selected MARC and the local disk option and saved them to a file.

The downloaded file was called export.txt so we renamed it export.marc for importing into Greenstone.

Importing

1. We ran the GLI for greenstone and created a new collection
2. Click the Gather tab, expand the Local Filespace
3. Drag export.marc into the Collection window and ADD the MARCplugin
4. Right-click on export.marc in the Collection window and select Explode Metadata Database
5. Select Dublin Core from the metadata_set pulldown and click Explode
6. Click the Design tab
7. Select Marcplugin and click Remove Plugin

You can now build and preview the collection but the display will be all wrong. Remove the default search and browsing indexes. Now you have to create indexes based on the Dublin Core metadata set, for example an Author index:

1. Click Design, Browsing Classifiers
2. Select AZcompactlist from the Select Classifier pulldown menu
3. Click Add Classifier
4. Select dc.Creator from the metadata pulldown menu
5. Place a tickmark in the allvalues checkbox
6. Place a tickmark in the buttonmane checkbox and type in Authors as a menu label

Let's also add a Subject index:

1. Select AZcompactlist from the Select Classifier pulldown
2. Click Add Classifier
3. Select dc.Subject from the metadata pulldown
4. Tick the allvalues checkbox
5. Click OK

Build and preview to see the new indexes, but the display will still be a bit off. The next step is to change the formatting instructions for the web pages:

1. Click Format, Format Features
2. Delete the first and last lines in the HTML Format String box

Once the above two lines are deleted you can refresh the web pages - html or css changes do not require a rebuild. However, if you click on the icon to view the document no text is displayed and the document heading is wrong.

1. Click on Format, Format Features
2. Select DocumentHeading in the Choose Feature pulldown
3. Delete the text in the HTML Format String box

This gets red of the incorrect document header. Now to fix the 'no text' document:

1. Click Format, Format Features
2. Select DocumentText in the Choose Feature pulldown
3. Replace [text] with the following HTML code:



You can add a little inline CSS to pretty up the presentation but you get the idea. The final step is to remove the buttons as there is nothing to detach and highlighting makes no sense in a short bibliographic record:

1. Click Format, Format Features
2. Select Document Buttons
3. Delete the text in the HTML Format String box

You can now preview the collection. Obviously there is a lot more that could be done such as having the index nodes indicate how many records they contain using the [numleafdocs] variable and we could make a much nicer display by tweaking the external style sheet....

Appendix

I downloaded the Terry Reese's excellent MARCedit program to do a little experiment on the marc records files i have saved from the library OPAC. I wanted to test out converting MARC to Dublin Core. Here's what i did:

1. Renamed the export.marc file to export.mrc (marcdedit does not recognize .marc extension)
2. Ran MARCedit and selected MARCbreaker
3. Select MARC -> Dublin Core
4. Choose export.mrc as the input file
4. Choose an export file of weldon_dc.txt and clicked Execute

It created a plain text XML file that uses rdf:description and replaces all the MARC tags with Dublin Core tags. Here is a snippet of the file of records:

Accelerated SQL Server 2008 [electronic resource] /
Walters, Robert E.
Coles, Michael.
Farmer, Donald.
Ferracchiati, Fabio.
Rae, Robert.
SpringerLink (Online service)
text
Berkeley, CA : Robert Walters,
2008.
eng
Data structures (Computer science)
Springer eBooks.
http://dx.doi.org/10.1007/978-1-4302-0606-4


The problem is that i cannot get this file to explode in Greenstone, it imports as one file of one record using LOMplug. I'll have to do some research on this tomorrow...



No comments: