ECO2DBASE Loader for BioWarehouse

Version 4.6

(C) 2006 SRI
International. All Rights Reserved.  See BioWarehouse
Overview for license details.

Introduction
Limitations
Creation of eco2dbase dump files
Importing eco2dbase Data
Loader Dependencies and Prerequisites
Dataset Specification
Translation Semantics for eco2dbase Tables
References

Introduction

This document describes version 4.6 of the eco2dbase Loader. It is one of several database loaders comprising the BioWarehouse. The eco2dbase Loader (referred to simply as the loader), loads the eco2dbase data into the BioWarehouse - an Oracle relational database that provides a common representation for diverse bioinformatics databases. Please see the references section for more information about eco2dbase.

The eco2dbase loader is different from other loaders in the BioWarehouse. SRI will create MySQL and Oracle database dump files containing the eco2dbase data. The reason for this is that the source eco2dbase data given to SRI in the format of an Oracle database dump file, and not all BioWarehouse users use Oracle users.

Therefore, BioWarehouse do not have to run the full data loader, and can simply just very quickly import the Oracle or MySQL dump file into their BioWarehouse Schema.

Overview of BioWarehouse Schema

The Bio-SPICE warehouse schema contains the data definition statements for the BioWarehouse. These include four different types of tables - constant tables, object tables, linking tables, and special tables.

Constant tables specify scientific data such as information from the Periodic Table of Elements, as well as constants used as column values in various warehouse tables.

Object tables describe a type of entity in a source database, such as compounds and proteins. Each column of an object table specifies a parameter that characterizes the object. In addition to the parameters defined by the source database, the loader assigns a unique warehouse ID (WID) to each object, which is used by other tables to reference the object.

A special type of warehouse object is the dataset. A dataset object is created for each dataset loaded into the warehouse, i.e., the SWISS-PROT loader adds one row to this table when it is run. Its WID is referred to as the dataset WID and is a column in each object table, specifying the source database of the object.

A linking table describes relationships among objects. They contain WIDs of the associated objects, and any additional columns needed to characterize the relationship. In general, many-to-many relationships are supported. Special tables exist to capture reference and crossreference information and to facilitate lookup of objects.

Full schema information, including source files and browseable documentation, is available with this distribution.

Limitations

The eco2dbase loader should not be run concurrently with other loaders. The loader is likely to fail in this case. This limitation was imposed for performance reasons.

Creation of eco2dbase dump files*

*BioWarehouse users do not have to do this step.

This step will be run by SRI to update the MySQL and Oracle database dump files. SRI will run this step only when it is necessary to create new eco2dbase dump files. This is only necesary when there is a schema change, or there is new eco2dbase data. Information on how to run the eco2dbase loader to do this is contained in the document describing how to create the eco2dbase dump files.

Importing eco2dbase Data

Before importing the data, make sure the Oracle or MySQL environment is configured according to the Environment Setup (MySQL). Also make sure the schema is loaded into the database as specified in the Schema doc.

In addition, Oracle users must have the Oracle binary imp included in their PATH environment variable, and MySQL users must have the mysql binary as part of their PATH variable.

Also, Oracle users must set the TNS_ADMIN environmental variable to the directory containing your tnsnames.ora file. The tnsnames.ora file is the conventional file used to specify the connection parameters to the database.

The shell scripts to import the data is run from the eco2dbase-loader/bin directory. The database dump files are in the eco2dbase-loader/data directory.

Importing data into MySQL database

osprompt: sh ./mysql-load-eco2dbase.sh user password database host port ../data/eco2dbase_mysql.dmp

You will see the command about to be executed, and then you will be prompted for your password. Enter your password in, and the eco2dbase data will be loaded into the database. When the data is finished being loaded into the database, it will say "done loading..."

Importing data into Oracle database

osprompt: sh ./oracle-load-eco2dbase.sh user password database ../data/eco2dbase_orcl.dmp

You will see the command about to be executed, and as the data as imported you will see which tables is having data loaded into it. Upon completion, look at the file orcl_import.log to identify any errors during the import.

Loader Dependencies and Prerequisites

Other that the standard warehouse creation procedure, the loader does not require that any other Warehouse tools be run prior to its execution.

Dataset Specification

The loader adds one row to the Dataset table as follows:

Column values for Dataset row

Column Value assigned by eco2dbase loader

WID The next available reserved WID in the warehouse. Uniquely specifies this dataset in the warehouse. WIDs have been reserved for the eco2dbase dataset, so that thare no WID conflicts with other datasets.

Name 'eco2dbase'

ReleaseDate July-2000

Version NULL

LoadDate The time/date the loader was run to generate the dump files.

ChangeDate The date and time the loader completed to generate the dump files.

LoadedBy The value of the system environment variable USER for the account running the loader.

Application 'eco2dbase Loader'

ApplicationVersion NULL

HomeURL NULL

QueryURL NULL

**Column values for `Dataset` row**
Column	Value assigned by eco2dbase loader
`WID`	The next available reserved WID in the warehouse. Uniquely specifies this dataset in the warehouse. WIDs have been reserved for the eco2dbase dataset, so that thare no WID conflicts with other datasets.
`Name`	'eco2dbase'
`ReleaseDate`	July-2000
`Version`	NULL
`LoadDate`	The time/date the loader was run to generate the dump files.
`ChangeDate`	The date and time the loader completed to generate the dump files.
`LoadedBy`	The value of the system environment variable USER for the account running the loader.
`Application`	'eco2dbase Loader'
`ApplicationVersion`	NULL
`HomeURL`	NULL
`QueryURL`	NULL

Translation Semantics for eco2dbase Tables

Please see the document describing the translation semantics for eco2dbase.

References

eco2dbase at NCBI ftp site

Chapter 115 in the second edition of the book "Escherichia coli and Salmonella" by Neidhardt.