Enzyme Loader

Updated: $Date: 2006/4/14 11:31 $

(C) 2006 SRI International. All Rights Reserved.  See BioWarehouse Overview for license details.

This document describes how to build and run the  Enzyme loader. The Enzyme loader is located in the enzyme-loader/ subdirectory of the warehouse distribution.

For more information about the Enzyme loader, see enzyme-manual.html.

Building the Loader
Running the Enzyme loader
Troubleshooting
Developer Documentation

[top]  Building the Loader

Before building the loader, make sure Java and Ant are properly set up according to the Environment Setup document, and make sure the environment tests for Java and Ant pass. Also make sure the schema is loaded into the database as specified in the Schema document.

To build the loader, simply bring up a shell and navigate to the enzyme-loader/ directory. Execute

ant build

This will build the loader and place the distribution in the directory enzyme-loader/dist.

The build may report warnings when generating parser source. This is okay. The following document contains expected build output: expectedBuildOutput.txt.

[top]  Running the Enzyme loader

Enzyme can be run remotely or on the same machine where the database is installed. If the Enzyme loader is run remotely, make sure the database  is configured to accept remote connections. 

Obtaining the Enzyme database

The latest supported data version for the Enzyme loader is listed in the loader summary table. The enzyme database can be downloaded from the following website:

http://www.expasy.org/enzyme/

The only required file is enzyme.dat. The examples in this document assume this file is located in the following directory:

 /space/bio/databases/enzyme/27.0/enzyme.dat
However, enzyme.dat can be located anywhere. Simply substitute the new location where appropriate.

Loading the Enzyme database

The script runEnzymeLoader.sh in the dist directory is used to parse and load the Enzyme database. This script as the following usage:

./runEnzymeLoader.sh -f dataFile -a databaseServer -c databasePort -b dbmsType -u databaserUsername -p databasePassword -n databaseName/SID -v dataVersion -r dataReleaseDate

required parameters:
-f the data file name
-a the hostname of the database server
-c the port of the database server
-b the DBMS type (oracle or mysql)
-u username for database server
-p password for database server
-n name (for mysql) or SID (for oracle) of the database
-v version number of the input data
-r release date of the input data


For example:
    ./runEnzymeLoader.sh -f /space/bio/databases/enzyme/27.0/enzyme.dat -a localhost -c 3306 -b mysql -u myusername -p mypass -n biospice -v "39.0" -r "March 2006"
It is normal to see some parse errors.  An example of the expected output can be viewed here.

Next, the database data sets should be queried to ensure the Enzyme data set is loaded. See the document on Running the Perl Utiltity scripts to check this.

Note: If you choose to generate an oracle loader file and then read it into an Oracle database separately, use this command:

sqlldr user/passwd@connection-string filename

where filename is name of the control file to be read in by the Oracle loader.

[top]  Troubleshooting

The loader require a very large amount of memory. If the loader report "Java.lang.OutOfMemory" before the parser report, the loader failed. Alternatively, the loader may fail to run at all reporting an error such as "unable to allocate object heap". If either of these cases arises, it is possible to adjust the minimum and maximum heap sizes used by the loader. To do this, edit the file runEnzymeLoader.sh and look for the line which executes Java and sets the -ms and -mx parameters:

    ${JAVA_HOME}/bin/Java -ms100M -mx1500M ... (rest ommitted)
The default is a minimum heap size of 100M and a maximum heap size of 1500M. If the loader runs out of memory, the machine may not have enough memory. You can try increasing the maximum heap size which might help. If the loader do not run because they cannot allocate the object heap, slowly decrease the maximum heap size until the error goes away.

[top]  Developer Documentation

The following documents are located in the docs directory: