BioWarehouse 4.6
June 17, 2009
The BioWarehouse is a toolkit for constructing a warehouse of bioinformatics databases. It consists of a relational schema definition for bioinformatics datatypes, loaders for each component database, and Perl/SQL code to query the warehouse for testing and demonstrations. Both Oracle and MySQL are supported.
The contents of this program are subject to the Mozilla Public License Version 1.1 (the "License"); you may not use this program except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/. Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is the BioWarehouse. The Initial Developer of the Original Code is SRI International. Portions created by SRI International are Copyright (C) 2004. All Rights Reserved.
Full License Text
The BioWarehouse contains a set of loader programs. Each loader parses one or more input database file(s), translates the data into the warehouse schema, and inserts the data into the warehouse database. The BioWarehouse also contains sample SQL and perl code to query the database.
Release notes
- Description of the major changes in each release of the BioWarehouse.
Quick Start
- A quick reference for how to configure your system and install the BioWarehouse so that it can be used by the Bio-SPICE Dashboard and/or by individual users.
Databases and loaders
- The BioWarehouse contains loader programs for the following databases:
Loader summary. Oracle load times are for a 2.66 GHz Pentium with 2GB memory, with C loaders running locally on the server and Java loaders running remotely from a 1.5 GHz Pentium 4 client with 1GB memory. 
MySQL load times are for a 1.5 GHz Pentium 4 client with 1GB memory installed with Debian Linux version 3.1, networked with a similar server.
Database Earliest 
Supported
VersionLatest 
Supported
VersionInput size #Objects 
loadedLoader 
LanguageMySQL 
Load timeOracle 
Load timeNotes BioCyc 7.6 13.0 ?? MB 56,922 C unknown unknown Statistics are for the EcoCyc database BioPax 1.0 1.0 90 MB 
102,376 
Java 2.5 hrs 2.5 hrs Statistics are an example only. Perfomance dependant on input files. ChIP-Chip Java and C CMR 2004-05-28 23 59.7 GB 8,438,855 C unknown 75 hrs (est.) Statistics are for "-o original" option of loader eco2dbase July-2000 July-2000 4 MB 55,800 SQL 1 min 1 min 
Enzyme 22.0 45.0 5.3 MB 19,761 Java unknown 12 min 
GenBank 139.0 
152.0 
15.8 GB 
4,506,591 
Java 
68 hrs 
27.5 hrs 
Statistics are for the BCT division only 
Gene Ontology 2005-03 
2006-03 
13.5 MB 
26,476 
Java 
6 min 
6 min 
KEGG 34 50 10.5 GB 5,647,744 C unknown 30 hrs (est.) 
MAGE 1.0 1.1 Java 
MetaCyc Ontology 9.5 13.0 422 KB 1,706 C unknown 20 sec. Creates three datasets; totals are aggregate. NCBI Taxonomy 2003-12-12 2009-03-27 75 MB 495,817 C unknown 27 min 
UniProt SwissProt 15.2 
15.2 
3.7 GB 7,874,458 Java unknown 
7 hrs 
UniProt TrEMBL 7.1 
15.2 
34.3 GB 53,333,319 Java unknown 
62 hrs 
Clicking on the database link displays the documentation for the database loader. Be sure to read the section on Environment Setup before attempting to build or run any of the loaders.
Environment Setup
- Gives details on configuring the database (both MySQL and Oracle are supported), environment variables and required libraries. Read this document first before attempting to build or run any of the loaders.
Schema
- Contains schema documentation and instructions for loading the BioWarehouse schema.
Integration with the Bio-SPICE Dashboard
- Describes how the BioWarehouse is used in the Bio-SPICE Dashboard and in the February 2004 Bio-SPICE demonstration.
Utilities
- Describes the various utility programs that may be used with the BioWarehouse