Environment Setup

This document describes how to configure your computing environment to use the Warehouse. It assumes the reader is a developer that is comfortable setting up environment variables, running make files and working with the MySQL and/or the Oracle database management system. It is also assumed that the reader has sufficient system administration privileges and skills to install needed system software, or has access to such a system administrator.

The required configuration steps depend on the underlying database management system being used, and whether C-based and/or Java-based loaders (see the Databases and loaders table for a summary of available loaders). Follow the steps in each relevant section to configure your environment. A section is included which lists tests to determine if the environment is set up properly. The final section provides documentation on installing the schema.

Warehouse Platform Requirements
MySQL Database Setup
MySQL Client Setup
Oracle Database Setup
Oracle Client Setup
Java Setup
C Setup
Perl Setup
Environment Tests

[top]  Warehouse Platform Requirements

Red Hat Enterprise Linux ES release 3 was used to build and test the Oracle warehouse loaders. Debian 3.1 was used to build and test the MySQL loaders. The only supported build and runtime environment for the C loaders is Linux. Although it is possible to build and run the Java-based loaders remotely from another platform (ex: Windows), it is recommended that all loaders are built and run from Linux.

The space requirements for the warehouse vary greatly depending on which databases are loaded. A fully-loaded, fully-indexed Warehouse requires about 10-15 GB of disk space. Most of that is required by the largest databases, CMR and GenBank.

Loaders were tested on systems with 1 GB of physical memory. Loaders typically require 40-50 MB of virtual memory to run.

[top]  MySQL Database Setup

The version of MySQL supported in this release is 4.1.11.

First, download the appropriate files to install MySQL from mysql.com.

Next, edit the configuration file, /etc/my.cnf. An example file from a working configuration follows. The Warehouse uses the InnoDB option for storing tables. Among other things, this enables transaction support. Pay attention to all the lines with "innodb" in them, under the [mysqld] section. All of these options are documented online, at http://www.mysql.com/doc/en/InnoDB_start.html.

Sample /etc/my.cnf:
set-variable = innodb_buffer_pool_size=500M
set-variable = innodb_additional_mem_pool_size=100M
set-variable = innodb_log_file_size=75M
set-variable = innodb_log_buffer_size=10M
set-variable = max_allowed_packet=5M



The next step is to start the MySQL server. If it's already running, stop it. Then start it with
% /etc/rc.d/init.d/mysqld start
Now set root passwords:
% mysqladmin -u root -p password 'new-password'
% mysqladmin -h `hostname` -u root -p password 'new-password'
Note that the `hostname` expression can be replaced by the Fully Qualified Domain Name of the server you're running on. new-password is, of course, the new MySQL root password. Both of these commands must be run to set the MySQL password.

For more information regarding installation, see the MySQL web site.

Set up MySQL users and databases. You need to sign on as a MySQL user with sufficient privilege to administer accounts and create databases (e.g., the MySQL root).  For example, here we create a user, called biodemo, and two databases, biodemo1 and biodemo2.

osprompt: mysql -p -h host -u root
Enter password: ********
mysql> create database biodemo1;
mysql> create database biodemo2;
mysql> grant all privileges on biodemo1.* TO biodemo@'%' identified by 'password';
mysql> grant all privileges on biodemo2.* TO biodemo@'%' identified by 'password';
mysql> exit

After running the environment tests listed later in this document, the warehouse schema must be loaded. See the Schema document for instructions on how to load the schema into the database. Be sure to finish setting up the environment by reading the rest of this document before loading the schema.

[top]  MySQL Client Setup

Environment variables

Ensure that mysql is on your PATH.

MySQL Client libraries

Before building the C-based loaders, the MySQL client programming libraries and include files must be installed. This is discussed at MySQL C API.

Check that the MySQL header files are installed.  You should see something similar to the following:

$ ls /usr/include/mysql
dbug.h     m_string.h   my_global.h  my_no_pthread.h  mysql.h          mysqld_error.h  sslopt-longopts.h
errmsg.h   my_config.h  my_list.h    my_pthread.h     mysql_com.h      raid.h          sslopt-usage.h
m_ctype.h  my_dir.h     my_net.h     my_sys.h         mysql_version.h  sslopt-case.h   sslopt-vars.h
The C-loaders' Makefiles assume that the header files are located at /usr/include/mysql.  If your MySQL header files are located in a different place, it may be necessary to alter the Makefiles to point to this location.

Check that the MySQL programming libraries are installed.  You should see something similar to the following:

$ ls /usr/lib/*mysql*
libmysqlclient_r.so.10      libmysqlclient.so.10
libmysqlclient_r.so.10.0.0  libmysqlclient.so.10.0.0

[top]  Oracle Database Setup

The version of Oracle supported in this release is Previous versions of the warehouse supported Oracle Various intervening versions of Oracle may work with the warehouse, but they have not been tested.

Oracle was installed on Red Hat Enterprise Linux ES release 3 for testing. Some database loaders must be run on the same server where Oracle is installed. Since the loaders can only be run on Linux, it is assumed Oracle is also installed on Linux. We assume the reader is familiar with installing and configuring Oracle, or has the services of an admin with such knowledge. For more information regarding Oracle installation, see the following web sites:

The tablespace where the warehouse data is to be installed should be at least 10 gigabytes in size, preferably 20 gigabytes if space allows.

It is necessary to create a tablespace for the index information. The name of the tablespace is "INDEXES" and the tablespace should be at least 10 gigabytes in size, preferably 20 gigabytes if space allows.

After running the environment tests listed later in this document, the warehouse schema must be loaded. See the Schema document for instructions on how to load the schema into the database. Be sure to finish setting up the environment by reading the rest of this document before loading the schema.

[top]  Oracle Client Setup

Environment variables

It is assumed the Oracle environment is set appropriately according to the Oracle documentation. The following is example script to place in a .bashrc which will set the appropriate environment variables for a bash shell (Note: change /usr/local/oracle/linux81701 to the Oracle installation directory, and change the value of ORACLE_SID to the database name):

export ORACLE_HOME=/usr/local/oracle/linux81701
export ORACLE_SID=mydb
export LD_LIBRARY_PATH=/usr/local/oracle/linux81701/lib

# Path adjusted to pick up pro-c compiler
Note that the Oracle bin directory is included in the path. This is necessary in order to execute ProC when building the loaders (see below).


Before building the C loaders, the Oracle client programming package must be installed first. The makefiles for the C loaders requires that ProC be installed. ProC is installed as part of the Oracle Client Programmer installation. To check if Proc is installed, check that the executable ${ORACLE_HOME}/bin/proc exists. If it does not exist, run the Oracle Universal Installer and install the client programmer package.

ProC must be configured properly to run. To determine if ProC is running properly, there is a test makefile which is installed with the client programmer package. Navigate to the directory ${ORACLE_HOME}/precomp/demo/proc and execute "make sample1". If the make fails, most likely the problem is with the Proc configuration file. The ProC configuration file located at:

A common problem is that this file does not contain an include for the gcc egcs header files. A line like the following may have to be added to the pcscfg.cfg file in order to build the C loaders:

[top]  Java Setup

The Java-based loaders require the Sun Java Virtual Machine SDK, version 1.5.0. This can be downloaded from:

Earlier versions may work as well, but we recommend using 1.5+.

The Java-based loaders are built using ant version 1.6.2. Ant is a tool for executing XML makefiles. It can be downloaded from the apache jakarta project at:

To build ant from source, download the ant source, untar it and set the environment variable ANT_HOME to the directory where ant is iinstalled. Next execute "./build.sh install". Note that the install may fail with a chmod error. This is ok, and the files "bin/ant" and "bin/antRun" need to have execute permissions set.

Also make sure the following two environment variables are set:

Finally, add ${JAVA_HOME}/bin and ${ANT_HOME}/bin to the path.

[top]  C Setup

The C-based loaders require the following software:

C compiler from the GNU Compiler Collection (2.96+)
flex lexical analyzer (2.5.4+)
bison parser generator (1.875+)
Linux systems may have these programs pre-installed. If not, they may either be installed from the developer package that comes with linux, or downloaded from the sites above.

The PATH environment variable should include these programs.

[top]  Perl Setup

See Perl Utilities for configuring the environment to run Perl scripts for accessing MySQL and Oracle databases.

[top]  Environment Tests

When the environment is set up as specified, execute the following tests: