(C) 2006 SRI International.
All Rights Reserved. See BioWarehouse
Overview for license details.
This document describes how to build and run the MetaCyc Ontology Loader.
The MetaCyc Ontology loader is located in the metacyc-ontology-loader/
subdirectory of the warehouse distribution.
For more information regarding the MetaCyc Ontology loader, see the
MetaCyc Ontology Manual.
Before building the loader, make sure the environment is configured according to the Environment Setup. Also make sure the schema is loaded into the database as specified in the Schema document.
To build the loader, bring up a shell and navigate to the
metacyc-ontology-loader/src/
directory. Then:MySQL:
osprompt: make clean
osprompt: make db=mysql
Creates the filemysql-metacyc-ontology-loader
Oracle:
osprompt: make clean
osprompt: make db=oracle
If filewh_oracle_util.c
is reported missing, re-run the above make command:
osprompt: make db=oracle
Creates the fileoracle-metacyc-ontology-loader
Also, a symbolic link named
"metacyc-ontology-loader"
is created, which points to the newly created executable. This can be used as a synonym for the most recently created DBMS-specific loader if desired.If the build fails and gives errors about header files which are not found, read the section on configuring the appropriate client in Environment Setup. Posible problems are: improper installation of ProC (Oracle) or library/header files installed in an incorrect place.
Obtaining the MetaCyc Ontology databases
The MetaCyc Ontology Manual contains information regarding the MetaCyc database. The latest supported data version for the MetaCyc loader is listed in the loader summary table.
See http://biocyc.org/ for information on how to obtain a license and download the data. Also see Pathway/Genome Database (PGDB) for a description of the database file format.
Running the MetaCyc Ontology loader
The
metacyc-ontology-loader/src/
directory contains scripts to run the MySQL and Oracle loaders.MySQL:
For example:./run-mysql host database user password datadir version releasedate [-m] host - The machine address where the MySQL server/database resides. database - Name of the MySQL database to be loaded. user - MySQL userid. password - MySQL password for userid. datadir - Directory which contains the PGDB database files to be loaded. version - Version of the MetaCyc database to be loaded. Ex: "10.0" releasedate - Release date of the database to be loaded. Ex: "2008-04-01" -m - If a dataset named BioCyc exists, merges all loads into it. If multiple BioCycs exist, the one with the largest DataSetWID is used. If no BioCyc exists, operates normally.
./run-mysql 123.45.67.8 warehouse me mypwd /space/bio/databases/biocyc/metacyc/10.0 "10.0" 2008-04-01This command loads the MetaCyc ontologies into the MySQL database namedwarehouse
. The data files for MetaCyc are located in the directory/space/bio/databases/biocyc/metacyc/10.0
and the user name and password used to access MySQL areme
andmypwd
.Oracle:
For example:./run-oracle "user/passwd" datadir version releasedate [-m] user/passwd - User name and password. Ex: "dan/mypwd", "dan/mypwn@mydb" datadir - Directory which contains the PGDB database files. version - Version of the MetaCyc database to be loaded. Ex: "10.0" releasedate - Release date of the database to be loaded. Ex: "2008-04-01" -m - If a dataset named BioCyc exists, merges all loads into it. If multiple BioCycs exist, the one with the largest DataSetWID is used. If no BioCyc exists, operates normally.
./run-oracle "me/mypwd@mydb" /space/bio/databases/biocyc/metacyc/10.0/data "10.0" 2008-04-01This command loads the MetaCyc ontologies into the Oracle databasemydb
. The data files for MetaCyc are located in the directory/space/bio/databases/biocyc/metacyc/10.0
and the user name and password used to access Oracle is"me/mypwd"
.
Sample output can be found here. In general, the loader may report parse errors.
The database data sets should be queried to ensure the MetaCyc data set is loaded. See the document on Running the Perl Utiltity scripts to check this. Unlike most other loaders, this loader creates three datasets, so all three datasets should be checked to confirm they have populated the
Term
table. The number of rows in this table for each dataset should match the number of terms loaded as indicated in the loader log.