BioWarehouse 4.6
June 17, 2009
The BioWarehouse is a toolkit for constructing a warehouse of bioinformatics databases. It consists of a relational schema definition for bioinformatics datatypes, loaders for each component database, and Perl/SQL code to query the warehouse for testing and demonstrations. Both Oracle and MySQL are supported.
The contents of this program are subject to the Mozilla Public License Version 1.1 (the "License"); you may not use this program except in compliance with the License. You may obtain a copy of the License at http://www.mozilla.org/MPL/. Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. The Original Code is the BioWarehouse. The Initial Developer of the Original Code is SRI International. Portions created by SRI International are Copyright (C) 2004. All Rights Reserved.
Full License Text
The BioWarehouse contains a set of loader programs. Each loader parses one or more input database file(s), translates the data into the warehouse schema, and inserts the data into the warehouse database. The BioWarehouse also contains sample SQL and perl code to query the database.
Release notes
- Description of the major changes in each release of the BioWarehouse.
Quick Start
- A quick reference for how to configure your system and install the BioWarehouse so that it can be used by the Bio-SPICE Dashboard and/or by individual users.
Databases and loaders
- The BioWarehouse contains loader programs for the following databases:
Loader summary. Oracle load times are for a 2.66 GHz Pentium with 2GB memory, with C loaders running locally on the server and Java loaders running remotely from a 1.5 GHz Pentium 4 client with 1GB memory.
MySQL load times are for a 1.5 GHz Pentium 4 client with 1GB memory installed with Debian Linux version 3.1, networked with a similar server.
Database Earliest
Supported
VersionLatest
Supported
VersionInput size #Objects
loadedLoader
LanguageMySQL
Load timeOracle
Load timeNotes BioCyc 7.6 13.0 ?? MB 56,922 C unknown unknown Statistics are for the EcoCyc database BioPax 1.0 1.0 90 MB
102,376
Java 2.5 hrs 2.5 hrs Statistics are an example only. Perfomance dependant on input files. ChIP-Chip Java and C CMR 2004-05-28 23 59.7 GB 8,438,855 C unknown 75 hrs (est.) Statistics are for "-o original" option of loader eco2dbase July-2000 July-2000 4 MB 55,800 SQL 1 min 1 min
Enzyme 22.0 45.0 5.3 MB 19,761 Java unknown 12 min
GenBank 139.0
152.0
15.8 GB
4,506,591
Java
68 hrs
27.5 hrs
Statistics are for the BCT division only
Gene Ontology 2005-03
2006-03
13.5 MB
26,476
Java
6 min
6 min
KEGG 34 50 10.5 GB 5,647,744 C unknown 30 hrs (est.)
MAGE 1.0 1.1 Java
MetaCyc Ontology 9.5 13.0 422 KB 1,706 C unknown 20 sec. Creates three datasets; totals are aggregate. NCBI Taxonomy 2003-12-12 2009-03-27 75 MB 495,817 C unknown 27 min
UniProt SwissProt 15.2
15.2
3.7 GB 7,874,458 Java unknown
7 hrs
UniProt TrEMBL 7.1
15.2
34.3 GB 53,333,319 Java unknown
62 hrs
Clicking on the database link displays the documentation for the database loader. Be sure to read the section on Environment Setup before attempting to build or run any of the loaders.
Environment Setup
- Gives details on configuring the database (both MySQL and Oracle are supported), environment variables and required libraries. Read this document first before attempting to build or run any of the loaders.
Schema
- Contains schema documentation and instructions for loading the BioWarehouse schema.
Integration with the Bio-SPICE Dashboard
- Describes how the BioWarehouse is used in the Bio-SPICE Dashboard and in the February 2004 Bio-SPICE demonstration.
Utilities
- Describes the various utility programs that may be used with the BioWarehouse