BioWarehouse 4.6

June 17, 2009

The contents of this program are subject to the Mozilla Public License Version 1.1 (the "License"); you may not use this program except in compliance with the License. You may obtain a copy of the License at Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is the BioWarehouse. The Initial Developer of the Original Code is SRI International. Portions created by SRI International are Copyright (C) 2004. All Rights Reserved.

Full License Text

The BioWarehouse is a toolkit for constructing a warehouse of bioinformatics databases. It consists of a relational schema definition for bioinformatics datatypes, loaders for each component database, and Perl/SQL code to query the warehouse for testing and demonstrations. Both Oracle and MySQL are supported.

The BioWarehouse contains a set of loader programs. Each loader parses one or more input database file(s), translates the data into the warehouse schema, and inserts the data into the warehouse database. The BioWarehouse also contains sample SQL and perl code to query the database.

Release notes

Description of the major changes in each release of the BioWarehouse.

Quick Start

A quick reference for how to configure your system and install the BioWarehouse so that it can be used by the Bio-SPICE Dashboard and/or by individual users.

Databases and loaders

The BioWarehouse contains loader programs for the following databases:

Loader summary. Oracle load times are for a 2.66 GHz Pentium with 2GB memory, with C loaders running locally on the server and Java loaders running remotely from a 1.5 GHz Pentium 4 client with 1GB memory.
MySQL load times are for a 1.5 GHz Pentium 4 client with 1GB memory installed with Debian Linux version 3.1, networked with a similar server.
Database Earliest
Input size #Objects
Load time
Load time
BioCyc 7.6 13.0 ?? MB 56,922 C unknown unknown Statistics are for the EcoCyc database
BioPax 1.0 1.0  90 MB
Java 2.5 hrs 2.5 hrs Statistics are an example only. Perfomance dependant on input files.
ChIP-Chip         Java and C      
CMR 2004-05-28 23 59.7 GB 8,438,855 C unknown 75 hrs (est.) Statistics are for "-o original" option of loader
eco2dbase July-2000 July-2000 4 MB 55,800 SQL 1 min 1 min  
Enzyme 22.0 45.0 5.3 MB 19,761 Java unknown 12 min  
GenBank 139.0
15.8 GB
68 hrs
27.5 hrs
Statistics are for the BCT division only
Gene Ontology 2005-03
13.5 MB
6 min
6 min

KEGG 34 50 10.5 GB 5,647,744 C unknown 30 hrs (est.)  
MAGE 1.0 1.1


MetaCyc Ontology 9.5 13.0 422 KB 1,706 C unknown 20 sec. Creates three datasets; totals are aggregate.
NCBI Taxonomy 2003-12-12 2009-03-27 75 MB 495,817 C unknown 27 min  
UniProt SwissProt 15.2
3.7 GB 7,874,458 Java unknown
7 hrs

UniProt TrEMBL 7.1
34.3 GB 53,333,319 Java unknown
62 hrs

Clicking on the database link displays the documentation for the database loader. Be sure to read the section on Environment Setup before attempting to build or run any of the loaders.

Environment Setup

Gives details on configuring the database (both MySQL and Oracle are supported), environment variables and required libraries. Read this document first before attempting to build or run any of the loaders.


Contains schema documentation and instructions for loading the BioWarehouse schema.

Integration with the Bio-SPICE Dashboard

Describes how the BioWarehouse is used in the Bio-SPICE Dashboard and in the February 2004 Bio-SPICE demonstration.


Describes the various utility programs that may be used with the BioWarehouse

For support and inquiries please contact