(C) 2006 SRI International.
All Rights Reserved. See BioWarehouse
Overview for license details.
enzyme-loader/
subdirectory of the warehouse distribution.
For more information about the Enzyme loader, see enzyme-manual.html.
Before building the loader, make sure Java and Ant are properly set up according to the Environment Setup document, and make sure the environment tests for Java and Ant pass. Also make sure the schema is loaded into the database as specified in the Schema document.
To build the loader, simply bring up a shell and navigate to the
enzyme-loader/
directory. Execute
ant build
This will build the loader and place the distribution in the directory
The build may report warnings when generating parser source. This is okay. The following document contains expected build output: expectedBuildOutput.txt.enzyme-loader/dist
.
Enzyme can be run remotely or on the same machine where the database is installed. If the Enzyme loader is run remotely, make sure the database is configured to accept remote connections.
Obtaining the Enzyme database
The latest supported data version for the Enzyme loader is listed in the loader summary table. The enzyme database can be downloaded from the following website:
http://www.expasy.org/enzyme/The only required file is
enzyme.dat
. The examples in this document assume this file is located in the following directory:However, enzyme.dat can be located anywhere. Simply substitute the new location where appropriate./space/bio/databases/enzyme/27.0/enzyme.dat
Loading the Enzyme database
The script
runEnzymeLoader.sh
in thedist
directory is used to parse and load the Enzyme database. This script as the following usage:./runEnzymeLoader.sh -f dataFile -a databaseServer -c databasePort -b dbmsType -u databaserUsername -p databasePassword -n databaseName/SID -v dataVersion -r dataReleaseDate
required parameters:
-f the data file name
-a the hostname of the database server
-c the port of the database server
-b the DBMS type (oracle or mysql)
-u username for database server
-p password for database server
-n name (for mysql) or SID (for oracle) of the database
-v version number of the input data
-r release date of the input data
For example:
It is normal to see some parse errors. An example of the expected output can be viewed here../runEnzymeLoader.sh -f /space/bio/databases/enzyme/27.0/enzyme.dat -a localhost -c 3306 -b mysql -u myusername -p mypass -n biospice -v "39.0" -r "March 2006"
Next, the database data sets should be queried to ensure the Enzyme data set is loaded. See the document on Running the Perl Utiltity scripts to check this.
Note: If you choose to generate an oracle loader file and then read it into an Oracle database separately, use this command:
sqlldr user/passwd@connection-string filename
where
filename
is name of the control file to be read in by the Oracle loader.
The loader require a very large amount of memory. If the loader report
"Java.lang.OutOfMemory"
before the parser report, the loader failed. Alternatively, the loader may fail to run at all reporting an error such as "unable to allocate object heap". If either of these cases arises, it is possible to adjust the minimum and maximum heap sizes used by the loader. To do this, edit the filerunEnzymeLoader.sh
and look for the line which executes Java and sets the-ms
and-mx
parameters:The default is a minimum heap size of 100M and a maximum heap size of 1500M. If the loader runs out of memory, the machine may not have enough memory. You can try increasing the maximum heap size which might help. If the loader do not run because they cannot allocate the object heap, slowly decrease the maximum heap size until the error goes away.${JAVA_HOME}/bin/Java -ms100M -mx1500M ... (rest ommitted)
The following documents are located in the
docs
directory:
- enzyme-manual.html Enzyme Loader Manual
- timing-tests.html Old initial timing tests showing the comparisons of non-indexed versus indexed data in the warehouse.
- expectedBuildOutput.txt Contains expected output from building the loader using Ant.
- antlr_2.7.1_ref_man.pdf ANTLR reference manual. This documentation is meant for developers who intend to modify source code.
- quickref.html Notes on Java classes and their usage. This documentation is meant for developers who intend to modify source code.
- object-addition.html Explains how to add a new object to the Java code. This documentation is meant for developers who intend to modify source code.