(C) 2005 SRI
International. All Rights Reserved. See BioWarehouse
Overview for license details.
Introduction
The MetaCyc Ontology Loader (referred to simply as the loader), loads a portion of the MetaCyc Pathway/Genome Database (PGDB) into the BioWarehouse. PGDBs are implemented in a frame-based representation system which is implemented in Common Lisp. The loader inputs a textual flat file representation of a PGDB, converts it to the representation expressed in the BioWarehouse Schema, and loads this directly into an instance of the warehouse.
The loader loads all terms from three ontologies it contains:
Constant tables specify scientific data such as information from the Periodic Table of Elements, as well as constants used as column values in various warehouse tables.
Object tables describe a type of entity in a source database, such as compounds and proteins. Each column of an object table specifies a parameter that characterizes the object. In addition to the parameters defined by the source database, the loader assigns a unique em> warehouse ID (WID) to each object, which is used by other tables to reference the object.
A special type of warehouse object is the dataset. A dataset object is created for each dataset loaded into the warehouse, i.e., the SWISS-PROT loader adds one row to this table when it is run. Its WID is referred to as the dataset WID and is a column in each object table, specifying the source database of the object.
A linking table describes relationships among objects. They contain WIDs of the associated objects, and any additional columns needed to characterize the relationship. In general, many-to-many relationships are supported. Special tables exist to capture reference and crossreference information and to facilitate lookup of objects.
Schema documentation is available.
The latest supported data version for the MetaCyc Ontology loader is listed in the loader summary table. Attributes added to the MetaCyc Ontology schema after this version are not supported.
Only the three ontologies noted above are loaded by this loader - no other ontological terms, and no other data types present in MetaCyc.
The textual representation of a PGDB consists of several ASCII files. The loader loads only one file - classes.dat.
DataSet.WID
is used; typically this is the
dataset that was most recently loaded.
If no dataset of this name exists, a warning is issued and one is
created.
Column | Value assigned by BioCyc loader | |
---|---|---|
WID |
A small integer that uniquely identifies this dataset in the warehouse. | |
Name |
'MetaCyc Chemical Compound Ontology' or 'MultiFun Gene Ontology' or 'MetaCyc Pathway Ontology' |
|
Version |
Major version of MetaCyc that it loaded. | |
ReleaseDate |
The date that this version of MetaCyc was released. | |
LoadDate |
The date and time the loader was run. | |
ChangeDate |
The date and time the loader completed, NULL if the loader did not complete successfully. | |
LoadedBy |
The value of the system environment variable USER for the account running the loader. | |
Application |
MetaCyc Ontology Loader' | |
ApplicationVersion |
4.6 | |
HomeURL |
http://www.biocyc.org | |
QueryURL |
http://www.biocyc.org:1555 |
This section describes the semantic mapping between the objects
comprising the MetaCyc Ontology knowledge base and its associated flat file
representation to its representation in the BioWarehouse.
Semantics are expressed in tabular form, showing the mapping of
each source attribute to the warehouse Table.Column
values
computed from it. The most typical case is that the attribute is simply
copied into a warehouse column; if translation is more complex, an
explanation is given. Any attributes not listed are ignored.
Some attributes can occur multiple times for a source object. The notation ATTRIBUTE[*] is used to indicate that the semantics apply to all occurrences; typically a row is added to a warehouse table for each. The notation ATTRIBUTE[1], ATTRIBUTE[2], etc., is used where the attribute order is significant. If an attribute is missing from a source file but required by the warehouse schema (i.e., its column is qualified with NOT NULL), a warning is issued. If the missing attribute is not required, NULL is stored.
Term
table for each such entry in classes.dat.
Since each term is part of a hierarchy, Term.Hierarchy
is 'T' for all entries.
In addition, a row is added to the TermRelationship
table for every entry
except these three roots to describe the superclass of the entry. Each entry has exactly one
superclass except the roots, which have no superclass.
In the input file, classes are sorted in hierarchical order. This is exploited by the loader, to avoid multiple passes over the data.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
COMMENT[1] | The first comment is considered the defining comment and is stored at
Term.Definition .
|
|
COMMENT[2+] | Typically only one comment per entry is present, but if more are present, they are stored as comments: CommentTable.Comm ; CommentTable.OtherWID is the WID of this Term
object |
|
COMMON-NAME | Term.Name ;
if this attribute is missing, the UNIQUE-ID attribute is substituted. |
|
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this Term
object |
|
TYPES | One row is added to TermRelationship : TermRelationship.RelatedTermWID is the WID of the term object whose
UNIQUE-ID matches this value. TermRelationship.TermWID is the WID of this Term object. TermRelationship.Relationship is 'superclass'. Note: TYPES contains only one superclass, despite the plurality of its name. |
|
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of this Term
object. If it is 'Compounds-and-Elements', 'All-Genes', or 'Pathways', Term.Root is 'T' , else Term.Root is 'F' . |
Entry
table is created as follows:
Column | Value assigned by BioCyc loader | |
---|---|---|
OtherWID |
The Term WID of the entry described by this row.
|
|
InsertDate |
The time/date the loader was run. | |
CreationDate |
NULL | |
ModifiedDate |
NULL | |
LineNumber |
The line number from the input file on which this entry began. | |
LoadError |
'T' if a parse error is detected, 'F' otherwise. | |
DatasetWID |
The value Dataset.WID assigned
to the dataset containing the subontology this term belongs to. Three distinct datasets are created by the loader. |