Wednesday, October 13, 2010

Greenstone and Z39.50

The local windows install of Greenstone for Windows contains a Z39.50 client but the documentation on how to use it is out of date and there seems to be some problems in saving the file in shared computer labs. The following was tested on Greenstone 2.83 on Windows XP and describes how to do a title search for books on SQL against the Library of Congress z39.50 server database.

1. Click on the Download tab and select Z39.50
2. Enter the following parameters:

Host: lx2.loc.gov
Port: 210
Database: LCDB
Find: @attr 1=4 "SQL"

3. Click Download

You can set the Max Records if you want. A max of 500 records is returned for any query. The file of MARC records is saved to wherever folder was specified when Greenstone was installed. If installed on your own computer running windows XP this is most likely the C: drive in Documents and Settings\yourusername\Application Data\Greenstone\GLI\lx2.loc.gov with the filename set to the search string with an extension of .marc, in this example the file name was LCDB_@attr 1=4 SQL_500.marc.

When we ran this in the GU and GRC computer labs the file was not written even though the Greenstone log said it was. The problem seems to be that user application data is not saved locally but to a network share.

The file itself is a plain text ASCII file containing the 500 marc records. It looks like this:

Records: 500
[LCDB]Record type: USmarc

001 15430621
005 20080908175110.0
008 080827s2008 caua 001 0 eng d
906 $a 7 $b cbc $c copycat $d 2 $e ncip $f 20 $g y-gencatlg
925 0 $a acquire $b 2 shelf copies $x policy default
955 $a ps04 2008-08-27 z-processor 2 copies to ASCD $i jx09 2008-09-08 $e jx09 2008-09-08 c. 1-2 to BCCD
010 $a 2008297695
020 $a 9781590599693 (pbk.)
020 $a 1590599691 (pbk.)
035 $a (OCoLC)ocn179801564
040 $a BTCTA $c BTCTA $d BAKER $d YDXCP $d OCO $d CDX $d BWX $d OCLCQ $d DLC
042 $a lccopycat
082 04 $a 005.7565 $2 22
050 00 $a QA76.9.D3 $b A284 2008

For information on the Library of Congress Z39.50 server see:
http://www.loc.gov/z3950/lcserver.html and for information on the syntax of Z39.50 queries using Yaz see: http://www.indexdata.com/zebra/doc/querymodel-rpn.html

The next step is to import the MARC records into Greenstone.

  1. Click the Gather tab
  2. Expand the Local Filespace and drag LCDB_@attr 1=4 SQL_500.marc into the Collection window
  3. When asked click Add Plugin to add the MARCplug import program

MARCplugin uses a file called marctodc.txt located in the /gsdl/etc folder to map MARC field numbers to Dublin Core metadata based on (http://lcweb.loc.gov/marc/dccross.html). It is also possible to use the RFC 1807 Bibliographic records metadata set for the following exercise but we will use Dublin Core as that is the metadata scheme most commonly used for online digital collections. You could also use the qualified Dublin Core metadata set. To use the qualified dublin core or the RFC 1807 metadata set clikc on Enrich, Manage Metadata Sets, select a set and click Add. With those additional sets added to Greenstone you can choose them in the next sequence. However, you cannot have both Dubin Core 1.1 and the qualifed Dublin Core at the same time; you must choose one or the other.

The next step is to extract or 'explode' the individual MARC records from the file.

Select
LCDB_@attr 1=4 SQL_500.marc in the Collection Window and right-click
Select Explode Metadata Database from the menu
Place a tick mark in the metadata_set option
Select Dublin Core from the metadata_set pulldown menu and click Explode
Click the Enrich tab



Because Greenstone assigns metadata to files, each MARC record has been assigned to a file with a .nul extension (to indicate the files are really null). Select 00000006.nul to view the metadata.

The next step is to process these .nul files using the NULplugin. First we have to remove the MARCplugin so it does not try and process the records.

  1. Click the Design tab
  2. Select MARCplugin
  3. Click Remove Plugin
  4. Click Create
  5. Click Build Collection
  6. Click Preview Collection
The record index will look something like this:


View the document Beginning Microsoft SQL server 2008 administration / Chris Leiter ... [et al.].
(00000010.nul)
View the document Best damn Exchange, SQL and IIS book period / Conrad H. Agramont, Jr. ...[et al.]
(00000011.nul)
View the document Data transformation with dts: sql server 7 and 2000 / James Samuelson ... [et al.] ; [edited by] Gina Brown, Karen Wachs, Laura Loveall.
(00000012.nul)
View the document Database benchmarking : practical methods for Oracle & SQL server / Bert Scalzo ... [et al.].
(00000013.nul)


If you select a Title and click on the icon to get the text of the record, nothing is shown. This is because there are no files with text, we only have metadata. The Document icon should either be removed or the DocumentText instructions changed to display meaningful metadata and a cover image would be nice.

1.



You can now proceed to create some useful indexes using dc elements and to format the display of your indexes (and to remove the full-text search function) to create a useful bibliographic collection.

1 comment:

gnickers said...

Note the 'view the document' hyperlink does not work as that refers to a file running on the greenstone local web server which does not support access from the internet