Bio-SPICE BioWarehouse and Dashboard

This document describes how the Bio-SPICE BioWarehouse is integrated into the Dashboard. It also describes the details of how the Warehouse v. 3.5 was used in the 2004 Bio-SPICE demonstration.

Public Warehouse
Workflow components
Microarray DataFetch Workflow
Workflow for Arbitrary Queries
Representation of Experimental Data

Public Warehouse

A publicly-available version of the Warehouse has been created for the Bio-SPICE community. It is a resource for the community as well as a component in the early 2004 demonstration.  Please visit our PublicHouse page for more information.

The public Warehouse uses the MySQL DBMS.

Connection parameters

The default connection parameters for the Dashboard are:

Workflow components

The Warehouse is accessed from the Dashboard by using certain components of the Analyzer Table to build a workflow that inputs Warehouse data.

The Data Warehouse Query analyzer takes a SQL query as an input parameter and produces an output containing the query result. The result is in TimeSeries format. The Table View analyzer displays the query result.

Connection parameters are specified as input parameters, and default to the above values. Note that if the default userid is used, a working default password is provided internally.

(in development) A .sql file can be used as a data source for the Data Warehouse Query analyzer. This file should contain a query to the Warehouse expressed in SQL that is legal for MySQL.

Microarray DataFetch Workflow

The Data Warehouse Query analyzer is used in conjunction with the Matlab and NCA analyzers to initiate the Bio-SPICE demonstration. The Data Warehouse Query analyzer provides the MIAMExpress archive data to the Matlab analyzer in the form of a query result. The data from this query drives the rest of the demonstration. The specific analyzer sequence in the workflow for the initial portion of the demonstration is
datawarehouse -> timeseries to zip file converter -> nca ->
geneways -> pathwaybuilder -> vatech package

Since there is only one MIAMExpress archive loaded, the following query suffices to retrieve the archive data:

 select contents from Archive where toolname = 'MIAMExpress'

Since this query returns a large (4 Mbyte) result, it is not advisable to display its result directly. To see if the archive is loaded, use the following:

 select length(contents) from Archive where toolname = 'MIAMExpress'
It should return 4342029.

Workflow for Arbitrary Query

The Data Warehouse Query analyzer is used in conjunction with the Table View analyzer to illustrate the results of arbitrary queries of the Warehouse.

Warehouse Representation of Experimental Data

The Experiment table is used to represent experiments and experimental observations. The table is recursive in nature, allowing experiments and observations of arbitrary structure. It allows simple 'flat' experiments, tree-structured experiments consisting of heterogeneous subexperiments, subexperiments corresponding to time-series observations, and repeated trials of identical experiments.

For a hierarchical experiment, data should be associated with the Experiment at the appropriate level. For example, if data reflects results from averaging numerous identically conducted trials, that data should be associated with the Experiment representing the group of these trials.

An experiment may have an archive specified for it, either as data in the Warehouse itself, or as a URL. The archive contains a representation of the experiment and its data in a well-defined format (e.g., MIAMExpress) or supporting data (e.g., images).

The ExperimentData table is used to represent data that is associated with an Experiment entity. Besides observations, it can be used to specify a parameter, a computed value, or metadata describing other data for the experiment. The data itself is represented in character form. The experiment-dependent role the data has in the experiment is also indicated.

Microarray experiment representation

The lcDNA microarray experiment used in the Bio-SPICE demo is organized within the Experiment table into four 'levels' as follows. The first level contains the root experiment, which is actually a group of replicated experiments. The second level represents three replicates of the microarray study. The third level represents two hybridization replicates, i.e., duplicated observations at each time point. The fourth level represents individual observations. Each has three ExperimentData rows associated with it that contains the gene expression data -- the green channel, the red channel, and the normalized result. (Note: at this time, the ExperimentData rows are not loaded for the demonstration, since their data are contained in the MIAMExpress archive, and access to individual ExperimentData rows is not required for the demonstration).

Example Experiment Table

This is an example of the entries in the Experiment Table for a lcDNA microarray experiment replicated three times, with observations taken at four time points, and each of these observations replicated twice:

WID GroupWID Type GroupType GroupSize GroupIndex TimePoint (note)
1000 null 'microarray' 'replicate' 3 null null Group of 3 identical time-coursed experiments

1100 1000 'microarray' 'time-series' 4 1 null A time-coursed experiment with 4 hybridizations
1200 1000 'microarray' 'time-series' 4 2 null A time-coursed experiment with 4 hybridizations
1300 1000 'microarray' 'time-series' 4 3 null A time-coursed experiment with 4 hybridizations

1110 1100 'microarray' 'replicate' 2 1 null Group of 2 duplicated hybridizations
1120 1100 'microarray' 'replicate' 2 2 null Group of 2 duplicated hybridizations
1130 1100 'microarray' 'replicate' 2 3 null Group of 2 duplicated hybridizations
1140 1100 'microarray' 'replicate' 2 4 null Group of 2 duplicated hybridizations
1210 1200 'microarray' 'replicate' 2 1 null Group of 2 duplicated hybridizations
1220 1200 'microarray' 'replicate' 2 2 null Group of 2 duplicated hybridizations
1230 1200 'microarray' 'replicate' 2 3 null Group of 2 duplicated hybridizations
1240 1200 'microarray' 'replicate' 2 4 null Group of 2 duplicated hybridizations
1310 1300 'microarray' 'replicate' 2 1 null Group of 2 duplicated hybridizations
1320 1300 'microarray' 'replicate' 2 2 null Group of 2 duplicated hybridizations
1330 1300 'microarray' 'replicate' 2 3 null Group of 2 duplicated hybridizations
1340 1300 'microarray' 'replicate' 2 4 null Group of 2 duplicated hybridizations

1111 1110 'microarray' null 0 1 30 Hybridization, with associated ExperimentData row
1112 1110 'microarray' null 0 2 30 Hybridization, with associated ExperimentData row
1121 1120 'microarray' null 0 1 60 Hybridization, with associated ExperimentData row
1122 1120 'microarray' null 0 2 60 Hybridization, with associated ExperimentData row
1131 1130 'microarray' null 0 1 90 Hybridization, with associated ExperimentData row
1132 1130 'microarray' null 0 2 90 Hybridization, with associated ExperimentData row
1141 1140 'microarray' null 0 1 120 Hybridization, with associated ExperimentData row
1142 1140 'microarray' null 0 2 120 Hybridization, with associated ExperimentData row
(repeat above 8 for 12xx and 13xx)
Example ExperimentData Table
This is an example of an entry in the ExperimentData Table for this experiment:

WID ExperimentWID Kind Role DateProduced Data
3111 1111 'O' 'green' [string containing vector of gene expression data, MIAMExpress format]