This document describes how to configure your computing environment to use the Warehouse. It assumes the reader is a developer that is comfortable setting up environment variables, running make files and working with the MySQL and/or the Oracle database management system. It is also assumed that the reader has sufficient system administration privileges and skills to install needed system software, or has access to such a system administrator.
The required configuration steps depend on the underlying database management system being used, and whether C-based and/or Java-based loaders (see the Databases and loaders table for a summary of available loaders). Follow the steps in each relevant section to configure your environment. A section is included which lists tests to determine if the environment is set up properly. The final section provides documentation on installing the schema.
Red Hat Enterprise Linux ES release 3 was used to build and test the Oracle warehouse loaders. Debian 3.1 was used to build and test the MySQL loaders. The only supported build and runtime environment for the C loaders is Linux. Although it is possible to build and run the Java-based loaders remotely from another platform (ex: Windows), it is recommended that all loaders are built and run from Linux.
The space requirements for the warehouse vary greatly depending on which databases are loaded. A fully-loaded, fully-indexed Warehouse requires about 10-15 GB of disk space. Most of that is required by the largest databases, CMR and GenBank.
Loaders were tested on systems with 1 GB of physical memory. Loaders typically require 40-50 MB of virtual memory to run.
The version of MySQL supported in this release is
4.1.11
.First, download the appropriate files to install MySQL from mysql.com.
Next, edit the configuration file,
/etc/my.cnf
. An example file from a working configuration follows. The Warehouse uses the InnoDB option for storing tables. Among other things, this enables transaction support. Pay attention to all the lines with "innodb" in them, under the [mysqld] section. All of these options are documented online, at http://www.mysql.com/doc/en/InnoDB_start.html.
Sample/etc/my.cnf
:[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
innodb_data_file_path=ibdata1:30M:autoextend
set-variable = innodb_buffer_pool_size=500M
set-variable = innodb_additional_mem_pool_size=100M
set-variable = innodb_log_file_size=75M
set-variable = innodb_log_buffer_size=10M
set-variable = max_allowed_packet=5M
innodb_flush_log_at_trx_commit=1
[mysql.server]
user=mysql
basedir=/var/lib
[safe_mysqld]
err-log=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
The next step is to start the MySQL server. If it's already running, stop it. Then start it with% /etc/rc.d/init.d/mysqld startNow set root passwords:% mysqladmin -u root -p password 'new-password'Note that the `hostname` expression can be replaced by the Fully Qualified Domain Name of the server you're running on. new-password is, of course, the new MySQL root password. Both of these commands must be run to set the MySQL password.
% mysqladmin -h `hostname` -u root -p password 'new-password'For more information regarding installation, see the MySQL web site.
Set up MySQL users and databases. You need to sign on as a MySQL user with sufficient privilege to administer accounts and create databases (e.g., the MySQL root). For example, here we create a user, called
biodemo
, and two databases,biodemo1
andbiodemo2
.osprompt: mysql -p -h host -u root
Enter password: ********
mysql> create database biodemo1;
mysql> create database biodemo2;
mysql> grant all privileges on biodemo1.* TO biodemo@'%' identified by 'password';
mysql> grant all privileges on biodemo2.* TO biodemo@'%' identified by 'password';
mysql> exit
After running the environment tests listed later in this document, the warehouse schema must be loaded. See the Schema document for instructions on how to load the schema into the database. Be sure to finish setting up the environment by reading the rest of this document before loading the schema.
Environment variables
Ensure that mysql is on your PATH.
MySQL Client libraries
Before building the C-based loaders, the MySQL client programming libraries and include files must be installed. This is discussed at MySQL C API.
Check that the MySQL header files are installed. You should see something similar to the following:
$ ls /usr/include/mysql
dbug.h m_string.h my_global.h my_no_pthread.h mysql.h mysqld_error.h sslopt-longopts.h
errmsg.h my_config.h my_list.h my_pthread.h mysql_com.h raid.h sslopt-usage.h
m_ctype.h my_dir.h my_net.h my_sys.h mysql_version.h sslopt-case.h sslopt-vars.h
$ ls /usr/lib/*mysql*
libmysqlclient_r.so.10
libmysqlclient.so.10
libmysqlclient_r.so.10.0.0 libmysqlclient.so.10.0.0
The version of Oracle supported in this release is
10.1.0.2.0
. Previous versions of the warehouse supported Oracle8.1.7.0.1
. Various intervening versions of Oracle may work with the warehouse, but they have not been tested.Oracle was installed on Red Hat Enterprise Linux ES release 3 for testing. Some database loaders must be run on the same server where Oracle is installed. Since the loaders can only be run on Linux, it is assumed Oracle is also installed on Linux. We assume the reader is familiar with installing and configuring Oracle, or has the services of an admin with such knowledge. For more information regarding Oracle installation, see the following web sites:
The tablespace where the warehouse data is to be installed should be at least 10 gigabytes in size, preferably 20 gigabytes if space allows.
It is necessary to create a tablespace for the index information. The name of the tablespace is
"INDEXES"
and the tablespace should be at least 10 gigabytes in size, preferably 20 gigabytes if space allows.After running the environment tests listed later in this document, the warehouse schema must be loaded. See the Schema document for instructions on how to load the schema into the database. Be sure to finish setting up the environment by reading the rest of this document before loading the schema.
Environment variables
It is assumed the Oracle environment is set appropriately according to the Oracle documentation. The following is example script to place in a .bashrc which will set the appropriate environment variables for a bash shell (Note: change
/usr/local/oracle/linux81701
to the Oracle installation directory, and change the value ofORACLE_SID
to the database name):Note that the Oracle bin directory is included in the path. This is necessary in order to execute ProC when building the loaders (see below).export ORACLE_HOME=/usr/local/oracle/linux81701
export ORACLE_SID=mydb
export LD_LIBRARY_PATH=/usr/local/oracle/linux81701/lib
# Path adjusted to pick up pro-c compiler
export PATH=$ORACLE_HOME/bin:$PATHProC
Before building the C loaders, the Oracle client programming package must be installed first. The makefiles for the C loaders requires that ProC be installed. ProC is installed as part of the Oracle Client Programmer installation. To check if Proc is installed, check that the executable
${ORACLE_HOME}/bin/proc
exists. If it does not exist, run the Oracle Universal Installer and install the client programmer package.ProC must be configured properly to run. To determine if ProC is running properly, there is a test makefile which is installed with the client programmer package. Navigate to the directory
${ORACLE_HOME}/precomp/demo/proc
and execute "make sample1". If the make fails, most likely the problem is with the Proc configuration file. The ProC configuration file located at:A common problem is that this file does not contain an include for the gcc egcs header files. A line like the following may have to be added to the pcscfg.cfg file in order to build the C loaders:${ORACLE_HOME}/precomp/admin/pcscfg.cfg
include=/usr/lib/gcc-lib/i386-glibc21-linux/egcs-2.91.66/include
The Java-based loaders require the Sun Java Virtual Machine SDK, version 1.5.0. This can be downloaded from:
http://java.sun.com/j2se/1.5.0/download.htmlEarlier versions may work as well, but we recommend using 1.5+.The Java-based loaders are built using ant version 1.6.2. Ant is a tool for executing XML makefiles. It can be downloaded from the apache jakarta project at:
http://jakarta.apache.org/ant/To build ant from source, download the ant source, untar it and set the environment variableANT_HOME
to the directory where ant is iinstalled. Next execute"./build.sh install"
. Note that the install may fail with a chmod error. This is ok, and the files "bin/ant" and "bin/antRun" need to have execute permissions set.Also make sure the following two environment variables are set:
- JAVA_HOME : Set to the java installation directory
- ANT_HOME : Set to the ant installation directory
Finally, add
${JAVA_HOME}/bin
and${ANT_HOME}/bin
to the path.
The C-based loaders require the following software:
C compiler from the GNU Compiler Collection (2.96+)Linux systems may have these programs pre-installed. If not, they may either be installed from the developer package that comes with linux, or downloaded from the sites above.
flex lexical analyzer (2.5.4+)
bison parser generator (1.875+)
The PATH environment variable should include these programs.
See Perl Utilities for configuring the environment to run Perl scripts for accessing MySQL and Oracle databases.
When the environment is set up as specified, execute the following tests:
Java-based loaders:
- Run
"java -version"
You should see output similar to the following:java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0-b92)
Java HotSpot(TM) Client VM (build 1.5.0_04-b05, mixed mode, sharing)- Run
"ant -version"
You should see output similar to the following:Apache Ant version 1.6.2 compiled on September 28 2004
C-based loaders:
- Run
"gcc -v"
You should see output similar to the following:Gcc version 2.96 was used for testing, but any version installed when Linux was installed should work.Reading specs from /some/path/name
gcc version 2.96 20000731 (Red Hat Linux 7.3.2.96-122)
- Run
"flex --version"
You should see output similar to the following:Flex version 2.5.4 was used for testing, but any version of flex installed when Linux was installed should work.flex version 2.5.4
- Run
"bison --version"
You should see output similar to the following:Bison version 1.875 was used for testing.bison (GNU Bison) 1.875
Copyright 2004
Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
MySQL:
- (C-based loaders) Confirm that the MySQL client libraries are installed in the standard location:
osprompt: ls /usr/lib/*mysql*
You should see output similar to the following:
libmysqlclient_r.so.10 libmysqlclient.so.10
libmysqlclient_r.so.10.0.0 libmysqlclient.so.10.0.0
If not, the
make
of the MySQL C-based loaders will fail.
- (C-based loaders) Confirm that the MySQL header files are installed in the standard location:
osprompt: ls /usr/include/mysql
You should see output similar to the following:
dbug.h m_string.h my_global.h my_no_pthread.h mysql.h mysqld_error.h sslopt-longopts.h
errmsg.h my_config.h my_list.h my_pthread.h mysql_com.h raid.h sslopt-usage.h
m_ctype.h my_dir.h my_net.h my_sys.h mysql_version.h sslopt-case.h sslopt-vars.h
If not, the
make
of the MySQL C-based loaders will fail.
- (All loaders) Confirm that the mysql command line program is available and a database has been created and is accessible:
osprompt: mysql -p -h host -u userid database
where database is the name of a database that has been created to contain the Warehouse. You will be prompted to enter the password for userid; if it has no password, omit the
-p
option. You should see output similar to the following:Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 308 to server version: 4.1.11-Debian_4sarge2-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.If not, then either
mysql
has not been installed, or it is not available in your path.Oracle:
- (C-based loaders) Run
"proc"
. You should see output similar to the following:Pro*C/C++: Release 8.1.7.0.0 - Production on Fri Oct 4 11:34:34 2002
(c) Copyright 2000 Oracle Corporation. All rights reserved.
System default option values taken from: /usr/local/oracle/linux81701/precomp/admin/pcscfg.cfg
... (rest of output ommitted)- (All loaders) Run
"sqlplus"
. Log in using your Oracle user name and password. This should give you an sqlplus command prompt. This test ensures you have access to Oracle. The syntax is:
sqlplus username/password@SID
For example:
sqlplus tomlee/myPass@biospice