Tuesday, January 24, 2012

Problems with PDFbox Extension in Greenstone 2.85

The new features document for 2.85 indicates that there is significant improvement in the PDFbox extension which is usually an add-in to the Apache web server.

When you run Greenstone it pops up a message box noting the PDFbox extension has not been installed and listing a URL for the greenstone developer's TRAC site.

Problem #1

The files referenced by the URL in the box and on the developer's blog are not valid archive files. Both the .zip file and the .gz file are 8kb in size instead of the expect 9mb and cannot be opened in an archive manager. So you cannot get the latest version. Searching the TRAC site is a lesson in frustration.

I tried packaging the source files from the TRUNK and adding them manually to the greenstone\ext folder but when launching the GLI it fails to load until they are removed. Next i looked at the pdfbox project page but their files don't seem compatible. So i decided to use the an older version.

Problem #2

Finding an old version was a bit of work but i managed to find a link in the TRAC site and to download the .zip (9mb) and to extract the files to the greenstone\ext folder. Ran the GLI and configured the PDFplugin to use the PDFbox  extension. Created a new collection and imported a bunch of small PDF files. Only 2 got imported as the others were rejected. The problem was a 'writable error'. I checked the pdf-box folder in greenstone\gli and it was marked as read-only so i reset the permissions for the folder and its contents and re-ran the GLI and the import. This time it looked liked the files were imported but the PDFbox failed with a java error. The workaround was to untick the pdfbox checkbox in the pdfplugin. Now all the pdf's were imported ok but none of the enhanced features offered by pdfbox were available.

After wasting a morning chasing this problem down I had to advise the students NOT to use pdfbox but to go back to the old workarounds of converting their PDF files to older 1.4 versions. A big disappointment!

Wednesday, January 11, 2012

Installing Greenstone 2.85 on Windows 7

Do Not Install Greenstone 2.85 as Administrator in Window 7 Ultimate 32-bit

I removed greenstone 2.84 from my machine in prep for the new version. The old version had worked fine and had been installed as admin to c:\program files\greenstone. I also got another laptop over xmas for testing win stuff so i installed 2.85 to that. It runs Windows 7 Home Premium 64 bit (argh why so many versions?) and the install was straightforward and ended up in c:\program files\greenstone so i didn't anticpate any problems.



Ran Greenstone-2.85-windows.exe as Administrator and it installed OK to c:\progam files\greenstone but when I ran the GLI and the following error message was returned:

Cannot initialize Network (reason WSASYSNOTREADY)


So we shut down greenstone and removed the software and tried the install again with the same result. We shut it down and then checked to see if anything was conflicting with it's web server. We opened a terminal window and ran netstat:


C:\Users\gnickers>netstat -an |find /i "Listening"
  TCP    0.0.0.0:135            0.0.0.0:0
  TCP    0.0.0.0:445            0.0.0.0:0
  TCP    0.0.0.0:3390           0.0.0.0:0
  TCP    0.0.0.0:5357           0.0.0.0:0
  TCP    0.0.0.0:17500          0.0.0.0:0
  TCP    0.0.0.0:34378          0.0.0.0:0
  TCP    0.0.0.0:49152          0.0.0.0:0
  TCP    0.0.0.0:49153          0.0.0.0:0
  TCP    0.0.0.0:49154          0.0.0.0:0
  TCP    0.0.0.0:49155          0.0.0.0:0
  TCP    0.0.0.0:49156          0.0.0.0:0
  TCP    127.0.0.1:80           0.0.0.0:0




And oddly enough there were 2 instances of a web server running on port 80. We ran a ports check cports using  and found two processes listed but when we went to kill them in the windows task manager they were not there!   Looks like greenstone did not clean up after itself!


Only a hard reset got rid of them. We next did an install of Greenstone but not as the administrator and this installed the program to users\gnickers\greenstone and it ran fine. We shut it down and checked the active ports with netstat and the httpd instance had shut down correctly. 


So the change is that unlike 2.84 the new version cannot be installed using admin privileges (which makes sense). Would have be nice to put this in the release notes.


Those who are installing greenstone for the first time may also run into the problem of some other program using port 80 which is the standard port for web traffic. Older versions of Skype were bad for doing this. 


To check type cmd in the start text box and press ENTER. The run netstat using the above syntax. The cports program (search on Google) provides more info that netstat but you must run it as admin. If you have Skype using port 80 set your skype options in the Connection tab to use a different port such as 34378 or something. A list of ports is available on Wikipedia and other sources.

The other solution is to install Greenstone and then select Start >  Programs > Greenstone 2.85 > Greenstone Server and select File, Settings from the menu. Now change the Port Number from 80 to something like 8080  or some other port not in use.