BioWarehouse
Quick Start Guide

This document outlines the configuration and installation procedure for setting up a MySQL version of the Bio-SPICE Warehouse for use as a component of the Bio-SPICE Dashboard and/or through conventional command-line programs and APIs. It includes instructions for loading the microarray data required by the Bio-SPICE demonstration into the Warehouse.

  1. In parallel with the following steps, you may wish to start downloading input files for the databases to be loaded. Some of them are quite large. Loader Summary contains links to download locations for the various databases.

  2. Perform the Environment Setup for MySQL server and client. If installing large datasets like CMR (or the full Warehouse), the server machine should be fast and have a good amount of memory.

  3. Set up MySQL users and databases. By convention, the user for the demonstration is biodemo, and the databases are biodemo1 and biodemo2. You need to sign on as a MySQL user with sufficient privilege to administer accounts and create databases (e.g., the MySQL root):
    osprompt: mysql -p -h host -u root
    Enter password: ********
    mysql> create database biodemo1;
    mysql> create database biodemo2;
    mysql> grant all privileges on biodemo1.* TO biodemo@'%' identified by 'password';
    mysql> grant all privileges on biodemo2.* TO biodemo@'%' identified by 'password';
    mysql> exit
  4. Perform the Environment Setup for Perl and for the MySQL Java and C loaders, and perform the indicated tests.

  5. Load the MySQL Schema into the Warehouse databases:
    osprompt: cd PARENTDIR/warehouse/schema
    osprompt: mysql -p -h host -u user
    Enter password: ********
    mysql> use biodemo1;
    mysql> source warehouse-mysql-create.sql;

    [output omitted]
    57 rows in set (0.0 sec)

    mysql> use biodemo2;
    mysql> source warehouse-mysql-create.sql;

    [output omitted]
    57 rows in set (0.0 sec)

    mysql> exit
    If necessary, see the above link for instructions on reloading the schema.

    This loads the schema with a minimal set of indexes. This is done so that the loaders are faster. Once the loads are complete, the mysql-index.sql should be run in order to build a full set of indices to improve query times. See last step for more details.

  6. Load any/all of the available loaders. Here is a list by increasing load time; it is suggested that run them in this order:
    1. Follow the instructions for building, running, and testing the (C-based) loader for the NCBI Taxonomy database.
    2. Follow the instructions for building, running, and testing the (Java-based) loader for the Enzyme database.
    3. Follow the instructions for building, running, and testing the (Java-based) loader for the Gene Ontology database. If you desire links to be built to Gene Ontology by other loaders, load this into a single dataset (i.e., don't use the -c option).
    4. Follow the instructions for building, running, and testing the (C-based) loader for the MetaCyc Ontology database.
    5. Follow the instructions for building, running, and testing the (C-based) loader for the BioCyc database. BioCyc consists of a number of databases, one per organism.
    6. Follow the instructions for building, running, and testing the (Java-based) loader for the UniProt (SwissProt, TrEMBL, and PIR) database. This loader requires that the NCBI Taxonomy and Enzyme loaders be run first.
    7. Follow the instructions for building, running, and testing the (C-based) loader for the KEGG database.
    8. Follow the instructions for building, running, and testing the (C-based) loader for the CMR database.
    9. Follow the instructions for building, running, and testing the (Java-based) loader for the MAGE database.
    10. Follow the instructions for building, running, and testing the (Java-based) loader for the GenBank database.


  7. Load the MySQL indexes into the Warehouse:
    osprompt: cd PARENTDIR/warehouse/schema
    osprompt: mysql -p -h host -u user
    Enter password: ********

    mysql> use biodemo1;

    mysql> source warehouse-mysql-index.sql;