BioCyc Loader for BioWarehouse

Version 4.6

(C) 2006 SRI
International. All Rights Reserved.  See BioWarehouse
Overview for license details.

Introduction
Limitations
Installation and Building
Obtaining input data
Loader Dependencies and Prerequisites
Running the Loader
Dataset and BioSource Specification
Translation Semantics for BioCyc Objects

Publications

Genes

Entry Table
References

Introduction

This document describes version 4.6 of the BioCyc Loader. It is one of several database loaders comprising the BioWarehouse. The BioCyc Loader (referred to simply as the loader), loads a Pathway/Genome Database (PGDB) into the BioWarehouse - a relational database that provides a common representation for diverse bioinformatics databases.

PGDBs are implemented in a frame-based representation system which is implemented in Common Lisp. The loader inputs a textual flat file representation of a PGDB, converts it to the representation expressed in the Bio-SPICE Warehouse Schema, and loads this directly into an instance of the warehouse.

Overview of BioWarehouse Schema

The Bio-SPICE warehouse schema contains the data definition statements for the BioWarehouse. These include three different types of tables - constant tables, object tables, linking tables, and special tables.

Constant tables specify scientific data such as information from the Periodic Table of Elements, as well as constants used as column values in various warehouse tables.

Object tables describe a type of entity in a source database, such as compounds and proteins. Each column of an object table specifies a parameter that characterizes the object. In addition to the parameters defined by the source database, the loader assigns a unique em> warehouse ID (WID) to each object, which is used by other tables to reference the object.

A special type of warehouse object is the dataset. A dataset object is created for each dataset loaded into the warehouse, i.e., the SWISS-PROT loader adds one row to this table when it is run. Its WID is referred to as the dataset WID and is a column in each object table, specifying the source database of the object.

A linking table describes relationships among objects. They contain WIDs of the associated objects, and any additional columns needed to characterize the relationship. In general, many-to-many relationships are supported. Special tables exist to capture reference and crossreference information and to facilitate lookup of objects.

Schema documentation is available.

Limitations

The latest supported data version for the BioCyc loader is listed in the loader summary table. Attributes added to the BioCyc schema after this version are not supported.

The loader silently ignores the numerous source attributes that have no analogue in the BioWarehouse.

In the reaction graph specification in the PREDECESSORS attribute of pathways, due to parsing limitations a reaction may have at most two predecessor reactions. Additional reactions are flagged as generic syntax errors.

The loader treats the MetaCyc database as if contained data for a single organism named "MetaCyc", rather than as containing experimentally elucidated data from many organisms.

Many proteins do not have the COMMON-NAME attribute. This is because multifunctional enzymes often have different names depending on which enzymatic function is being referred to. If no common name is specified (and we prefer that no common name be specified if an activity name is appropriate), we use the common name of the corresponding enzymatic-reaction frame (or a concatenation of them separated by / if there there are multiple enzymatic-reactions).

The individual components of a publication -- title, year, etc. -- are not broken out by the loader into the associated columns of the Citation table; only the concatenation of all components is loaded into Citation.Citation.

The TRANSCRIPTION-DIRECTION attribute is ignored for transcription units. In particular, neither it nor the TRANSCRIPTION-DIRECTION of the associated gene is used in computing Feature.StartPosition of Feature.EndPosition for DNA binding sites. This could lead to incorrect values for these columns.

Installation and Building

See BioCyc installation instructions for details on installing and building the loader.

Loader Dependencies and Prerequisites

Other that the standard warehouse creation procedure, the loader does not require that any other Warehouse tools be run prior to its execution. However, if the MetaCyc Ontology or Gene Ontology datasets have been loaded, the loader builds useful links to objects in these datasets. A warning is issued if either of these datasets has not been loaded. If multiple versions of these datasets are present in the Warehouse, the dataset with the maximum WID (typically the most recently loaded) is used.

The Multifun Ontology contained in the MetaCyc Ontology dataset is used to populate rows of the RelatedTerm table for genes that specify a Multifun type. The Gene Ontology dataset is used to populate rows of the RelatedTerm table for proteins that specify a GO term. See genes translation and proteins translation for details.

The loader is invoked as part of the ChIP-chip loader. When invoked in this manner, a restricted set of data files, with a minimal amount of data, are loaded and merged into a dataset containing other BioWarehouse data as well. Also, the command-line arguments provided to the loader are specified in a properties file. See the ChIP-chip documentation for full details.

Input data

BioCyc PGDB databases are available for a number of species. However, a license may be required to obtain them. Visit BioCyc downloads or send a request to biocyc-info@ai.sri.com for details.

The textual representation of a PGDB consists of several ASCII files. A subset of these are used by the BioCyc loader. Furthermore, not all files are present in all PGDBs; in particular, several files are not present for the MetaCyc PGDB. Input files are loaded in the following order:

 pubs.dat  [not present for all BioCyc PDDBs] 
     compounds.dat 
     proteins.dat 
     protseq.fasta [not present for all BioCyc PGDBs] 
     transunits.dat  [not present for all BioCyc PGDBs] 
     genes.dat 
     promoters.dat [not present for the MetaCyc PGDB] 
     terminators.dat [not present for the MetaCyc PGDB] 
     dnabindsites.dat [not present for the MetaCyc PGDB] 
     reactions.dat 
     enzrxns.dat 
     regulation.dat 
     pathways.dat

See the PGDB flat file format specification for detailed specification on the contents of the input files used by the loader. Most files are in attribute-value format.

Running the Loader

The BioCyc installation instructions contain details for running the loader, including options and a description of its output.

Dataset and BioSource Specification

If the -m (merge) command line option is used, the loader loads data into the dataset named "BioCyc", using the WID of this entry as the DataSetWID for all objects it adds to the Warehouse. If multiple datasets of this name exist, the one with the maximal DataSet.WID is used; typically this is the dataset that was most recently loaded. If no dataset of this name exists, a warning is issued and one is created.

If the -m command line option is not used, the loader adds one row to the Dataset table as follows, using the WID of this entry as the DataSetWID for all objects it adds to the Warehouse:

Column values for Dataset row

Column Value assigned by BioCyc loader

WID A small integer that uniquely identifies this dataset in the warehouse.

Name 'speciesCyc', where species is an abbreviation for the organism represented in the PGDB.

Version Major version of PGDB that is loaded.

ReleaseDate The date that this version of the PGDB was released.

LoadDate The date and time the loader was started.

ChangeDate The date and time the loader completed, NULL if the loader did not complete successfully.

LoadedBy The value of the system environment variable USER for the account running the loader.

Application 'BioCyc Loader'

ApplicationVersion 4.6

HomeURL http://www.biocyc.org

QueryURL http://www.biocyc.org:1555

**Column values for `Dataset` row**
Column	Value assigned by BioCyc loader
`WID`	A small integer that uniquely identifies this dataset in the warehouse.
`Name`	'speciesCyc', where species is an abbreviation for the organism represented in the PGDB.
`Version`	Major version of PGDB that is loaded.
`ReleaseDate`	The date that this version of the PGDB was released.
`LoadDate`	The date and time the loader was started.
`ChangeDate`	The date and time the loader completed, NULL if the loader did not complete successfully.
`LoadedBy`	The value of the system environment variable USER for the account running the loader.
`Application`	'BioCyc Loader'
`ApplicationVersion`	4.6
`HomeURL`	http://www.biocyc.org
`QueryURL`	http://www.biocyc.org:1555

The loader adds one row to the BioSource table as follows, defining the organism the BioCyc KB describes:

Column values for BioSource row

Column Value assigned by BioCyc loader

WID The next available WID in the warehouse. Uniquely specifies this BioSource in the warehouse.

Name The value specified by the -o command line option, e.g., 'Bacillus subtilis'

Strain NULL

DatasetWID The value Dataset.WID assigned to the dataset being loaded.

**Column values for `BioSource` row**
Column	Value assigned by BioCyc loader
`WID`	The next available WID in the warehouse. Uniquely specifies this BioSource in the warehouse.
`Name`	The value specified by the `-o` command line option, e.g., 'Bacillus subtilis'
`Strain`	NULL
`DatasetWID`	The value `Dataset.WID` assigned to the dataset being loaded.

Translation Semantics for BioCyc Objects

This section describes the semantic mapping between the objects comprising a BioCyc knowledge base and its associated flat file representation to its representation in the Bio-SPICE data warehouse. Semantics are expressed in tabular form, showing the mapping of each source attribute to the warehouse Table.Column values computed from it. The most typical case is that the attribute is simply copied into a warehouse column; if translation is more complex, an explanation is given. Any attributes not listed are ignored.

Some attributes can occur multiple times for a source object. The notation ATTRIBUTE[*] is used to indicate that the semantics apply to all occurrences; typically a row is added to a warehouse table for each. The notation ATTRIBUTE[1], ATTRIBUTE[2], etc., is used where the attribute order is significant. If an attribute is missing from a source file but required by the warehouse schema (i.e., its column is qualified with NOT NULL), a warning is issued. If the missing attribute is not required, NULL is stored.

Publications

Publications are input from the file pubs.dat. Any BioCyc object can contain a reference to a publication. This is done with the CITATIONS attribute. This attribute contains the UNIQUE-ID of a publication, possibly enclosed in square brackets, and possibly without the leading "PUB-" prefix. Certain citations do not refer to publications; they refer to evidence. See the Support Table for translation details.

A row is added to the Citation table for each entry in pubs.dat, except for those with a REFERENT-FRAME attribute. The latter provide an alternative name for a publication - either the publication's UNIQUE-ID or its REFERENT-FRAME attribute may be used to refer to the publication in CITATIONS attributes in other files.

Translation semantics for pubs.dat

BioCyc Attribute Warehouse Semantics

AUTHOR[*] Concatenated together to form the list of authors of the publication. The order of the authors is preserved.
A comma and a space are inserted between each author.
Along with other attributes, this is included in the full text of the citation, stored at Citation.Citation.

COMMENT[*] CommentTable.Comm;
CommentTable.OtherWID is the WID of this Citation object

MEDLINE-UID Crossreference.XID;
Crossreference.DatabaseName is 'Medline';
Crossreference.OtherWID is the WID of this Citation object

PUBMED-ID Citation.PMID

REFERENT-FRAME Attribute provides an alternate name for UNIQUE-ID; either may be used in other files to refer to the publication.
Both refer to the same Citation object.
Used internally to associate a publication name with a Citation.WID.

TITLE,
SOURCE,
YEAR,
URL Concatenated together in the given order and appended to the full list of authors derived from the AUTHOR[*] attributes
to form Citation.Citation. Tabs are inserted between each attribute and the between the author list and TITLE.

UNIQUE-ID DBID.XID;
DBID.OtherWID is the WID of this Citation object.
Used internally to associate a publication name with a Citation.WID.

**Translation semantics for `pubs.dat`**
BioCyc Attribute	Warehouse Semantics
AUTHOR[*]	Concatenated together to form the list of authors of the publication. The order of the authors is preserved. A comma and a space are inserted between each author. Along with other attributes, this is included in the full text of the citation, stored at `Citation.Citation`.
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `Citation` object
MEDLINE-UID	`Crossreference.XID`; `Crossreference.DatabaseName` is 'Medline'; `Crossreference.OtherWID` is the WID of this `Citation` object
PUBMED-ID	`Citation.PMID`
REFERENT-FRAME	Attribute provides an alternate name for UNIQUE-ID; either may be used in other files to refer to the publication. Both refer to the same `Citation` object. Used internally to associate a publication name with a `Citation.WID`.
TITLE, SOURCE, YEAR, URL	Concatenated together in the given order and appended to the full list of authors derived from the *AUTHOR[] attributes to form `Citation.Citation`. Tabs are inserted between each attribute and the between the author list and TITLE**.
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of this `Citation` object. Used internally to associate a publication name with a `Citation.WID`.

Additional Tables

No additional table rows are added for publications.

Compounds

Compounds are input from the file compounds.dat. A row is added to the Chemical table for each entry in compounds.dat. A Chemical may be either a single compound or a class of compounds; all entries from this file are single compounds; hence Chemical.Class is always 'F'.

Note that rows are also added to Chemical when translating reactions.dat; these compounds are classes of compounds, and their Chemical.Class is 'T'.

Translation semantics for compounds.dat

BioCyc Attribute Warehouse Semantics

CAS-REGISTRY-NUMBERS Chemical.CAS

CHARGE Chemical.Charge

CHEMICAL-FORMULA[*] All (Element Number) pairs are concatenated to form Chemical.EmpiricalFormula
Ex: (H 2) and (O 1) form 'H2O1'

CITATIONS[*] Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-".
A row is added to CitationWIDOtherWID;
CitationWIDOtherWID.CitationWID is the WID of the Citation associated with this unique ID;
CitationWIDOtherWID.OtherWID is the WID of this Chemical

COMMENT[*] CommentTable.Comm;
CommentTable.OtherWID is the WID of this Chemical object

COMMON-NAME Chemical.Name

DBLINKS[*] Attribute is a list for two or more elements. A row is added to CrossReference;
CrossReference.OtherWID is the WID of this Chemical object;
CrossReference.DatabaseName is the first element of the list;
CrossReference.XID is the second element of the list;
the rest of the list is ignored.

MOLECULAR-WEIGHT Chemical.MolecularWeightCalc

PKA1 Chemical.PKA1

PKA2 Chemical.PKA2

PKA3 Chemical.PKA3

REGULATES Ignored. This is the converse of the REGULATOR attribute for an enzymatic regulator.

SMILES Chemical.Smiles

SYNONYMS[*] SynonymTable.Syn;
SynonymTable.OtherWID is the WID of this Chemical object

SYSTEMATIC-NAME Chemical.SystematicName

UNIQUE-ID DBID.XID;
DBID.OtherWID is the WID of this Chemical object

ACTIVATORS-UNMECH-OF,
INHIBITORS-ALLOSTERIC-OF,
INHIBITORS-IRRREVERSIBLE-OF,
INHIBITORS-OTHER-OF Ignored; these are symmetric analogues to the corresponding attributes INHIBITORS-ALLOSTERIC, etc. of enzrxns.dat.

**Translation semantics for `compounds.dat`**
BioCyc Attribute	Warehouse Semantics
CAS-REGISTRY-NUMBERS	`Chemical.CAS`
CHARGE	`Chemical.Charge`
CHEMICAL-FORMULA[*]	All (Element Number) pairs are concatenated to form `Chemical.EmpiricalFormula` Ex: (H 2) and (O 1) form 'H2O1'
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of this `Chemical`
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `Chemical` object
COMMON-NAME	`Chemical.Name`
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of this `Chemical` object; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
MOLECULAR-WEIGHT	`Chemical.MolecularWeightCalc`
PKA1	`Chemical.PKA1`
PKA2	`Chemical.PKA2`
PKA3	`Chemical.PKA3`
REGULATES	Ignored. This is the converse of the REGULATOR attribute for an enzymatic regulator.
SMILES	`Chemical.Smiles`
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of this `Chemical` object
SYSTEMATIC-NAME	`Chemical.SystematicName`
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of this `Chemical` object
ACTIVATORS-UNMECH-OF, INHIBITORS-ALLOSTERIC-OF, INHIBITORS-IRRREVERSIBLE-OF, INHIBITORS-OTHER-OF	Ignored; these are symmetric analogues to the corresponding attributes INHIBITORS-ALLOSTERIC, etc. of `enzrxns.dat`.

Additional Tables

No additional table rows are added for compounds.

Proteins

Proteins are input from the file proteins.dat. A row is added to the Protein table for each entry in proteins.dat.

**Translation semantics for `proteins.dat`**
BioCyc Attribute	Warehouse Semantics
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of this `Protein`
^COEFFICIENT[*]	`Subunit.Coefficient` for the immediately preceding COMPONENTS attribute.
COMMON-NAME	`Protein.Name`. Many proteins do not have the COMMON-NAME attribute. This is because multifunctional enzymes often have different names depending on which enzymatic function is being referred to. If no common name is specified (and we prefer that no common name be specified if an activity name is appropriate), we use the common name of the corresponding enzymatic-reaction frame (or a concatenation of them separated by / if there there are multiple enzymatic-reactions).
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `Protein` object
COMPONENTS[*]	Attribute is a UNIQUE-ID for a protein. A row is added to `Subunit`; `Subunit.ComplexWID` is the WID of this `Protein` object; `Subunit.SubunitWID` is the WID of the protein associated with the attribute; `Subunit.Coefficient` is the value of the immediately following ^COEFFICIENT attribute, defaulting to 1 if not explicit
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of this `Protein` object; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
FEATURES	Ignored
GENE	Ignored
GO-TERMS[*]	Ignored unless the Gene Ontology dataset has been previously loaded; each term is a `DBID.XID` from that dataset. A row is added to `RelatedTerm`: `RelatedTerm.TermWID` references this term; `RelatedTerm.OtherWID` is the WID of this `Protein` object. `RelatedTerm.Relationship` is 'keyword'
LOCATION[*]	`Location.Location`; `Location.ProteinWID` is the WID of this `Protein` object
MOLECULAR-WEIGHT-SEQ	`Protein.MolecularWeightCalc`
MOLECULAR-WEIGHT-EXP	`Protein.MolecularWeightExp`
PI	`Protein.PICalc`
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of this `Protein` object
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of this `Protein` object
ACTIVATORS-UNMECH-OF, COFACTORS-UNMECH-OF, INHIBITORS-UNMECH-OF, INHIBITORS-COMPETITIVE-OF, INHIBITORS-OTHER-OF, PROSTHETIC-GROUPS-OF	Ignored; these are symmetric analogues to the corresponding attributes INHIBITORS-COMPETITIVE, etc. of `enzrxns.dat`.

Additional Tables

A row is added to BioSourceWIDProteinWID for each protein entry. BioSourceWIDProteinWID.BioSourceWID is the WID from the one row of BioSource created by the loader for the species being loaded. Numerous other table rows are added for proteins, but they are added when the linked object is parsed. In particular, the GENE attribute is ignored, and the protein - gene link is created when genes are parsed.

If the MetaCyc Ontology dataset has been loaded, a row is added to RelatedTerm for each GO-TERMS attribute that is a term in Gene Ontology.

Protein Sequences

Protein sequences are input from the file protseq.fasta. This file is not an attribute-value format file; each entry contains a protein name, some other information which is ignored, and an amino acid sequence. The row for that protein is updated to store the amino acid sequence as Protein.AASequence.

Additional Tables

No additional table rows are added based on entries in protseq.fasta.

Transcription Units

Transcription Units are input from the file transunits.dat. A row is added to the TranscriptionUnit table for each entry in this file.

Translation semantics for transunits.dat

BioCyc Attribute Warehouse Semantics

CITATIONS[*] Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-".
A row is added to CitationWIDOtherWID;
CitationWIDOtherWID.CitationWID is the WID of the Citation associated with this unique ID;
CitationWIDOtherWID.OtherWID is the WID of this TranscriptionUnit

COMMENT[*] CommentTable.Comm;
CommentTable.OtherWID is the WID of this TranscriptionUnit object

COMMON-NAME TranscriptionUnit.Name
If missing, UNIQUE-ID is used in its place.

COMPONENTS[*] Ignored; a link to this entry is added when the component is loaded.

DBLINKS[*] Attribute is a list for two or more elements. A row is added to CrossReference;
CrossReference.OtherWID is the WID of this TranscriptionUnit object;
CrossReference.DatabaseName is the first element of the list;
CrossReference.XID is the second element of the list;
the rest of the list is ignored.

EXTENT-UNKNOWN? Ignored.

SYNONYMS[*] SynonymTable.Syn;
SynonymTable.OtherWID is the WID of this TranscriptionUnit object

UNIQUE-ID DBID.XID;
DBID.OtherWID is the WID of this TranscriptionUnit object

**Translation semantics for `transunits.dat`**
BioCyc Attribute	Warehouse Semantics
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of this `TranscriptionUnit`
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `TranscriptionUnit` object
COMMON-NAME	`TranscriptionUnit.Name` If missing, UNIQUE-ID is used in its place.
COMPONENTS[*]	Ignored; a link to this entry is added when the component is loaded.
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of this `TranscriptionUnit` object; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
EXTENT-UNKNOWN?	Ignored.
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of this `TranscriptionUnit` object
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of this `TranscriptionUnit` object

Additional Tables

No additional table rows are added based on entries in transunits.dat.

Genes

Genes are input from the file genes.dat. A row is added to the Gene table for each entry in genes.dat.

Translation semantics for genes.dat

BioCyc Attribute Warehouse Semantics

CITATIONS[*] Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-".
A row is added to CitationWIDOtherWID;
CitationWIDOtherWID.CitationWID is the WID of the Citation associated with this unique ID;
CitationWIDOtherWID.OtherWID is the WID of this Gene

COMMENT[*] CommentTable.Comm;
CommentTable.OtherWID is the WID of this Gene object

COMMON-NAME Gene.Name

COMPONENT-OF[*] If value matches the UNIQUE_ID of a previously loaded transcription unit, a row is added to TranscriptionUnitComponent:
TranscriptionUnitComponent.TranscriptionUnitWID is the WID of the TranscriptionUnit object;
TranscriptionUnitComponent.OtherWID is the WID of this Gene object;
TranscriptionUnitComponent.Type is 'gene'.

DBLINKS[*] Attribute is a list for two or more elements. A row is added to CrossReference;
CrossReference.OtherWID is the WID of this Gene object;
CrossReference.DatabaseName is the first element of the list;
CrossReference.XID is the second element of the list;
the rest of the list is ignored.

INTERRUPTED Gene.INTERRUPTED

LEFT-END-POSITION Gene.CodingRegionStart or
Gene.CodingRegionEnd, depending on TRANSCRIPTION-DIRECTION

PRODUCT[*] Value should match the UNIQUE_ID of a protein.
If so, a row is added to GeneWIDProteinWID.

RIGHT-END-POSITION Gene.CodingRegionEnd or
Gene.CodingRegionStart, depending on TRANSCRIPTION-DIRECTION

PI Gene.PICalc

SYNONYMS[*] SynonymTable.Syn;
SynonymTable.OtherWID is the WID of this Gene object

TRANSCRIPTION-DIRECTION A value of "+" indicates that LEFT-END-POSITION is stored as
Gene.CodingRegionStart and that RIGHT-END-POSITION is stored as
Gene.CodingRegionEnd.
A value of "-" indicates that LEFT-END-POSITION is stored as
Gene.CodingRegionEnd and that RIGHT-END-POSITION is stored as
Gene.CodingRegionStart.

TYPES[*] Ignored unless the MetaCyc Ontology dataset has been previously loaded; each type is a Term.Name from the Multifun subontology of that dataset. A row is added to RelatedTerm:
RelatedTerm.TermWID references a term that is this type.
RelatedTerm.OtherWID is the WID of this Gene object.
RelatedTerm.Relationship is 'superclass'

UNIQUE-ID Gene.GenomeID and
DBID.XID;
DBID.OtherWID is the WID of this Gene object

**Translation semantics for `genes.dat`**
BioCyc Attribute	Warehouse Semantics
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of this `Gene`
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `Gene` object
COMMON-NAME	`Gene.Name`
COMPONENT-OF[*]	If value matches the UNIQUE_ID of a previously loaded transcription unit, a row is added to `TranscriptionUnitComponent`: `TranscriptionUnitComponent.TranscriptionUnitWID` is the WID of the `TranscriptionUnit` object; `TranscriptionUnitComponent.OtherWID` is the WID of this `Gene` object; `TranscriptionUnitComponent.Type` is 'gene'.
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of this `Gene` object; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
INTERRUPTED	`Gene.INTERRUPTED`
LEFT-END-POSITION	`Gene.CodingRegionStart` or `Gene.CodingRegionEnd`, depending on TRANSCRIPTION-DIRECTION
PRODUCT[*]	Value should match the UNIQUE_ID of a protein. If so, a row is added to `GeneWIDProteinWID`.
RIGHT-END-POSITION	`Gene.CodingRegionEnd` or `Gene.CodingRegionStart`, depending on TRANSCRIPTION-DIRECTION
PI	`Gene.PICalc`
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of this `Gene` object
TRANSCRIPTION-DIRECTION	A value of "+" indicates that LEFT-END-POSITION is stored as `Gene.CodingRegionStart` and that RIGHT-END-POSITION is stored as `Gene.CodingRegionEnd`. A value of "-" indicates that LEFT-END-POSITION is stored as `Gene.CodingRegionEnd` and that RIGHT-END-POSITION is stored as `Gene.CodingRegionStart`.
TYPES[*]	Ignored unless the MetaCyc Ontology dataset has been previously loaded; each type is a `Term.Name` from the Multifun subontology of that dataset. A row is added to `RelatedTerm`: `RelatedTerm.TermWID` references a term that is this type. `RelatedTerm.OtherWID` is the WID of this `Gene` object. `RelatedTerm.Relationship` is 'superclass'
UNIQUE-ID	`Gene.GenomeID` and `DBID.XID`; `DBID.OtherWID` is the WID of this `Gene` object

Additional Tables

A row is added to GeneWIDProteinWID for each PRODUCT attribute, associating the gene with the gene product.

A row is added to BioSourceWIDGeneWID for each gene entry. BioSourceWIDGeneWID.BioSourceWID is the WID from the one row of BioSource created by the loader for the species being loaded.

A row is added to TranscriptionUnitComponent when a COMPONENT-OF attribute matches the UNIQUE-ID of a previously loaded transcription unit.

If the MetaCyc Ontology dataset has been loaded, a row is added to RelatedTerm for each TYPES attribute that is a term in the Multifun ontology that is part of the MetaCyc Ontology.

Promoters

Promoters are input from the file promoters.dat. A row is added to the Feature table for each entry in this file:

Feature.Type NULL.
Feature.Class is 'promoter'.
Feature.SequenceType is 'N'.
Feature.SequenceWID is NULL.
Feature.RegionOrPoint is 'point'.
Feature.PointType is 'center'.

**Translation semantics for `promoters.dat`**
BioCyc Attribute	Warehouse Semantics
ABSOLUTE-PLUS-1-POSITION	`Feature.StartPosition` and `Feature.EndPosition`. This is the position at which transcription starts.
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of this `Feature`
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `Feature` object
COMMON-NAME	`Feature.Description`
COMPONENT-OF[*]	If value matches the UNIQUE_ID of a previously loaded transcription unit, a row is added to `TranscriptionUnitComponent`: `TranscriptionUnitComponent.TranscriptionUnitWID` is the WID of the `TranscriptionUnit` object; `TranscriptionUnitComponent.OtherWID` is the WID of this `Feature` object; `TranscriptionUnitComponent.Type` is 'promoter'.
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of this `Feature` object; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
PROMOTER-EVIDENCE	Ignored. There is typically an associated CITATIONS attribute for thie evidence code.
REGULATED-BY	Ignored. This is the converse of the REGULATED-ENTITY attribute for a transcriptional regulator.
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of this `Feature` object
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of this `Feature` object

Additional Tables

A row is added to TranscriptionUnitComponent for each COMPONENT-OF attribute that references a previously loaded transcription unit.

Terminators

Terminators are input from the file terminators.dat. A row is added to the Feature table for each entry in this file:

Feature.Type NULL.
Feature.Class is 'terminator'.
Feature.SequenceType is 'N'.
Feature.SequenceWID is NULL.
Feature.RegionOrPoint is 'region'.

**Translation semantics for `terminators.dat`**
BioCyc Attribute	Warehouse Semantics
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of this `Feature`
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `Feature` object
COMMON-NAME	`Feature.Description`
COMPONENT-OF[*]	If value matches the UNIQUE_ID of a previously loaded transcription unit, a row is added to `TranscriptionUnitComponent`: `TranscriptionUnitComponent.TranscriptionUnitWID` is the WID of the `TranscriptionUnit` object; `TranscriptionUnitComponent.OtherWID` is the WID of this `Feature` object; `TranscriptionUnitComponent.Type` is 'terminator'.
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of this `Feature` object; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
LEFT-END-POSITION	`Feature.StartPosition`
RIGHT-END-POSITION	`Feature.EndPosition`
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of this `Feature` object
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of this `Feature` object

Additional Tables

A row is added to TranscriptionUnitComponent for each COMPONENT-OF attribute that references a previously loaded transcription unit.

DNA Binding Sites

DNA binding sites are input from the file dnabindsites.dat. A row is added to the Feature table for each entry in this file:

Feature.Type is NULL.
Feature.Class is always 'binding site'.
Feature.SequenceType is 'N'.
Feature.SequenceWID is NULL.
Feature.RegionOrPoint is 'point'.
Feature.PointType is 'center'.

**Translation semantics for `dnabindsites.dat`**
BioCyc Attribute	Warehouse Semantics
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of this `Feature`
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `Feature` object
COMMON-NAME	`Feature.Description`
COMPONENT-OF[*]	If value matches the UNIQUE_ID of a previously loaded transcription unit, a row is added to `TranscriptionUnitComponent`: `TranscriptionUnitComponent.TranscriptionUnitWID` is the WID of the `TranscriptionUnit` object; `TranscriptionUnitComponent.OtherWID` is the WID of this `Feature` object; `TranscriptionUnitComponent.Type` is 'binding site'.
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of this `Feature` object; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
REGULATED-PROMOTER	References the UNIQUE-ID of a promoter. Assuming that promoter has been previously loaded, its ABSOLUTE-PLUS-1-POSITION is used to convert this binding site's RELATIVE-CENTER-POSITION from a relative to an absolute position.
RELATIVE-CENTER-POSITION	This numeric value designates either an integral position, or a position halfway between two integral positions. First of all, if the attribute is positive, one is subtracted from it. It is then added to the `Feature.StartPosition` of the promoter named in the REGULATED-PROMOTER attribute. If the sum is integral it is stored in both `Feature.StartPosition` and `Feature.EndPosition`. If it is nonintegral, the next-lowest integer is stored in `Feature.StartPosition` and the next-highest integer is stored in `Feature.EndPosition`.
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of this `Feature` object
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of this `Feature` object

Additional Tables

A row is added to TranscriptionUnitComponent for each COMPONENT-OF attribute that references a previously loaded transcription unit.

Reactions

Reactions are input from the file reactions.dat. A row is added to the Reaction table for each entry in reactions.dat.

Translation semantics for reactions.dat

BioCyc Attribute Warehouse Semantics

BALANCE-STATE Ignored

CITATIONS[*] Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-".
A row is added to CitationWIDOtherWID;
CitationWIDOtherWID.CitationWID is the WID of the Citation associated with this unique ID;
CitationWIDOtherWID.OtherWID is the WID of this Reaction

^COEFFICIENT[*] Reactant.Coefficient or Product.Coefficient
for the immediately preceding LEFT or RIGHT attribute.

COMMENT[*] CommentTable.Comm;
CommentTable.OtherWID is the WID of this Reaction object

COMMON-NAME SynonymTable.Syn;
SynonymTable.OtherWID is the WID of this Reaction object

DBLINKS[*] Attribute is a list for two or more elements. A row is added to CrossReference;
CrossReference.OtherWID is the WID of this Reaction object;
CrossReference.DatabaseName is the first element of the list;
CrossReference.XID is the second element of the list;
the rest of the list is ignored.

DELTAG0 Reaction.DeltaG

EC-NUMBER Reaction.ECNumber or Reaction.ECNumberProposed,
depending on OFFICIAL-EC?

OFFICIAL-EC? Determines whether EC-NUMBER is stored as
Reaction.ECNumber or Reaction.ECNumberProposed

ORPHAN? Ignored

LEFT[*] A row is added to Reactant; its ReactionWID is the WID of this reaction.
The attribute designates a substrate or a class of substrates involved in the reaction. It is translated as discussed below (see Additional Tables); the WID for the substrate is stored in Reactant.OtherWID. If a ^COEFFICIENT attribute follows immediately, it is stored as Reactant.Coefficient. Otherwise the value 1 is stored.

RIGHT[*] A row is added to Product; its ReactionWID is the WID of this reaction..
The attribute designates a substrate or a class of substrates involved in the reaction. It is translated as discussed below (see Additional Tables); the WID for the substrate is stored in Product.OtherWID. If a ^COEFFICIENT attribute follows immediately, it is stored as Product.Coefficient. Otherwise the value 1 is stored.

SPONTANEOUS? Reaction.Spontaneous

SYNONYMS[*] SynonymTable.Syn;
SynonymTable.OtherWID is the WID of this Reaction object

UNIQUE-ID DBID.XID;
DBID.OtherWID is the WID of this Reaction object

**Translation semantics for `reactions.dat`**
BioCyc Attribute	Warehouse Semantics
BALANCE-STATE	Ignored
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of this `Reaction`
^COEFFICIENT[*]	`Reactant.Coefficient` or `Product.Coefficient` for the immediately preceding LEFT or RIGHT attribute.
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `Reaction` object
COMMON-NAME	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of this `Reaction` object
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of this `Reaction` object; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
DELTAG0	`Reaction.DeltaG`
EC-NUMBER	`Reaction.ECNumber` or `Reaction.ECNumberProposed`, depending on OFFICIAL-EC?
OFFICIAL-EC?	Determines whether EC-NUMBER is stored as `Reaction.ECNumber` or `Reaction.ECNumberProposed`
ORPHAN?	Ignored
LEFT[*]	A row is added to `Reactant`; its `ReactionWID` is the WID of this reaction. The attribute designates a substrate or a class of substrates involved in the reaction. It is translated as discussed below (see Additional Tables); the WID for the substrate is stored in `Reactant.OtherWID`. If a ^COEFFICIENT attribute follows immediately, it is stored as `Reactant.Coefficient`. Otherwise the value 1 is stored.
RIGHT[*]	A row is added to `Product`; its `ReactionWID` is the WID of this reaction.. The attribute designates a substrate or a class of substrates involved in the reaction. It is translated as discussed below (see Additional Tables); the WID for the substrate is stored in `Product.OtherWID`. If a ^COEFFICIENT attribute follows immediately, it is stored as `Product.Coefficient`. Otherwise the value 1 is stored.
SPONTANEOUS?	`Reaction.Spontaneous`
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of this `Reaction` object
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of this `Reaction` object

Additional Tables

As noted, a row is added to Reactant for each LEFT attribute and Product for each RIGHT attribute. If the attribute matches the UNIQUE-ID, the COMMON-NAME, or a SYNONYM of a Chemical or Protein, its WID is stored as Reactant.WID. Else it is assumed to specify a class of chemicals; an entry in Chemical is created for it, such that Chemical.Name is the attribute, and Chemical.Class is 'T'.

Enzymatic Reactions

Enzymatic reactions are input from the file enzrxns.dat. A row is added to the EnzymaticReaction table for each entry in enzrxns.dat.

Translation semantics for enzrxns.dat

BioCyc Attribute Warehouse Semantics

ALTERNATIVE-COFACTORS[*]
ALTERNATIVE-SUBSTRATES[*] The attribute consists of a list of names (PRIMARY ALT1 ALT2 ... ALTn).
For each name, the Chemical table is queried for a UNIQUE_ID or a COMMON_NAMEof a compound in the database being loaded. If none is found, a row is added to Chemical; the name is stored as Chemical.Name; its Chemical.WID is used as described below.

N rows are added to EnzReactionAltCompound, one for each ALTi:
EnzReactionAltCompound.EnzymaticReactionWID is the WID of this enzymatic reaction;
EnzReactionAltCompound.PrimaryWID is the WID associated with the primary compound;
EnzReactionAltCompound.AlternativeWID is the WID associated with compound ALTi;
EnzReactionAltCompound.Cofactor is 'T' for ALTERNATIVE-COFACTORS, 'F' for ALTERNATIVE-SUBSTRATES.

CITATIONS[*] Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-".
A row is added to CitationWIDOtherWID;
CitationWIDOtherWID.CitationWID is the WID of the Citation associated with this unique ID;
CitationWIDOtherWID.OtherWID is the WID of this EnzymaticReaction

COFACTORS[*]
PROSTHETIC-GROUPS[*] COFACTORS-OR-PROSTHETIC-GROUPS[*] The Chemical and Protein tables (in that order) are queried for a UNIQUE_ID or a COMMON_NAMEof a compound or a protein in the database being loaded that matches the value of this attribute. If none is found, a row is added to Chemical; the attribute value is stored as Chemical.Name its Chemical.WID is used as described below.

A row is added to EnzReactionCofactor; EnzReactionCofactor.EnzymaticReactionWID is the WID of this enzymatic reaction; EnzReactionCofactor.CompoundWID is the WID associated with the compound or protein; EnzReactionCofactor.Prosthetic is 'F' COFACTORS, 'T' for PROSTHETIC-GROUPS, and NULL for COFACTORS-OR-PROSTHETIC-GROUPS.

COMMENT[*] CommentTable.Comm;
CommentTable.OtherWID is the WID of this EnzymaticReaction object

COMMON-NAME Ignored

REQUIRED-PROTEIN-COMPLEX Value should match the UNIQUE_ID of a protein.
If so, the WID of the protein is stored as EnzymaticReaction.ComplexWID.

DBLINKS[*] Attribute is a list for two or more elements. A row is added to CrossReference;
CrossReference.OtherWID is the WID of this EnzymaticReaction object;
CrossReference.DatabaseName is the first element of the list;
CrossReference.XID is the second element of the list;
the rest of the list is ignored.

ENZYME Required. Value should match the UNIQUE_ID of a protein.
If so, the WID of the protein is stored as EnzymaticReaction.ProteinWID.

REACTION Required. Value should match the UNIQUE_ID of a reaction.
If so, the WID of the reaction is stored as EnzymaticReaction.ReactionWID.

REACTION-DIRECTION EnzymaticReaction.ReactionDirection.

REGULATED-BY Ignored. This is the converse of the REGULATED-ENTITY attribute for a enzymatic regulator.

SYNONYMS[*] SynonymTable.Syn;
SynonymTable.OtherWID is the WID of this Reaction object

UNIQUE-ID DBID.XID;
DBID.OtherWID is the WID of this EnzymaticReaction object

**Translation semantics for `enzrxns.dat`**
BioCyc Attribute	Warehouse Semantics
ALTERNATIVE-COFACTORS[] ALTERNATIVE-SUBSTRATES[]	The attribute consists of a list of names (PRIMARY ALT1 ALT2 ... ALTn). For each name, the `Chemical` table is queried for a UNIQUE_ID or a COMMON_NAMEof a compound in the database being loaded. If none is found, a row is added to `Chemical`; the name is stored as `Chemical.Name`; its `Chemical.WID` is used as described below. N rows are added to `EnzReactionAltCompound`, one for each ALTi: `EnzReactionAltCompound.EnzymaticReactionWID` is the WID of this enzymatic reaction; `EnzReactionAltCompound.PrimaryWID` is the WID associated with the primary compound; `EnzReactionAltCompound.AlternativeWID` is the WID associated with compound ALTi; `EnzReactionAltCompound.Cofactor` is 'T' for ALTERNATIVE-COFACTORS, 'F' for ALTERNATIVE-SUBSTRATES.
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of this `EnzymaticReaction`
COFACTORS[] PROSTHETIC-GROUPS[] COFACTORS-OR-PROSTHETIC-GROUPS[*]	The `Chemical` and `Protein` tables (in that order) are queried for a UNIQUE_ID or a COMMON_NAMEof a compound or a protein in the database being loaded that matches the value of this attribute. If none is found, a row is added to `Chemical`; the attribute value is stored as `Chemical.Name` its `Chemical.WID` is used as described below. A row is added to `EnzReactionCofactor`; `EnzReactionCofactor.EnzymaticReactionWID` is the WID of this enzymatic reaction; `EnzReactionCofactor.CompoundWID` is the WID associated with the compound or protein; `EnzReactionCofactor.Prosthetic` is 'F' COFACTORS, 'T' for PROSTHETIC-GROUPS, and NULL for COFACTORS-OR-PROSTHETIC-GROUPS.
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `EnzymaticReaction` object
COMMON-NAME	Ignored
REQUIRED-PROTEIN-COMPLEX	Value should match the UNIQUE_ID of a protein. If so, the WID of the protein is stored as `EnzymaticReaction.ComplexWID`.
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of this `EnzymaticReaction` object; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
ENZYME	Required. Value should match the UNIQUE_ID of a protein. If so, the WID of the protein is stored as `EnzymaticReaction.ProteinWID`.
REACTION	Required. Value should match the UNIQUE_ID of a reaction. If so, the WID of the reaction is stored as `EnzymaticReaction.ReactionWID`.
REACTION-DIRECTION	`EnzymaticReaction.ReactionDirection`.
REGULATED-BY	Ignored. This is the converse of the REGULATED-ENTITY attribute for a enzymatic regulator.
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of this `Reaction` object
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of this `EnzymaticReaction` object

Additional Tables

The loader adds rows to EnzReactionAltCompound, EnzReactionCofactor, and EnzReactionInhibitorActivator as noted above.

Regulation

A regulation entry describes a relationship between a regulated entity and an agent that performs regulation. There are two types of regulation that are represented in the database and translated to the BioWarehouse -- regulation of transcription, and regulation of enzyme activity. The regulation type is indicated by the TYPES attribute.

In transcriptional regulation, the regulator is a protein that is a transcription factor; the regulated entity is a promoter that is a component of one or more transcription units. In enzymatic regulation, the regulator is a chemical; the regulated entity is an enzymatic reaction. Comments, citations, synonyms, and crossreferences of each entry are linked to the regulator. Regulator proteins and compounds will have multiple DBIDs -- their UNIQUE-ID from this entry as well as the UNIQUE-ID from the protein or compound entry.

All regulation is characterized by a mode, indicating whether the process is inhibited or activated. In addition, enzymatic regulation is characterized by a regulation mechanism, as well as a flag indicating physiological relevance.

Note that for transcriptional regulation, no rows are added to the BioWarehouse. It has no representation of transcription factors. However, the naming conventions of BioCyc database may be exploited to find all transcription factors by finding all proteins that have a DBID.XID starting with 'REG'.

Translation semantics for regulation.dat where TYPES is 'Regulation-of-Transcription-Initiation'

BioCyc Attribute Warehouse Semantics

ASSOCIATED-BINDING-SITE Ignored.

CITATIONS[*] Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-".
A row is added to CitationWIDOtherWID;
CitationWIDOtherWID.CitationWID is the WID of the Citation associated with this unique ID;
CitationWIDOtherWID.OtherWID is the WID of the Protein referenced by the REGULATOR attribute.

COMMENT[*] CommentTable.Comm;
CommentTable.OtherWID is the WID of the Protein referenced by the REGULATOR attribute.

DBLINKS[*] Attribute is a list for two or more elements. A row is added to CrossReference;
CrossReference.OtherWID is the WID of the Protein referenced by the REGULATOR attribute;
CrossReference.DatabaseName is the first element of the list;
CrossReference.XID is the second element of the list;
the rest of the list is ignored.

MODE Ignored.

REGULATED-ENTITY References the UNIQUE-ID of a promoter Feature that is a component of a transcription unit.

REGULATOR References the UNIQUE-ID of a protein that is a transcription factor.

SYNONYMS[*] SynonymTable.Syn;
SynonymTable.OtherWID is the WID of the Protein referenced by the REGULATOR attribute.

TYPES 'Regulation-of-Transcription-Initiation'. Determines whether this entry is translated as transcriptional or enzymatic regulation.

UNIQUE-ID DBID.XID;
DBID.OtherWID is the WID of the Protein referenced by the REGULATOR attribute.

**Translation semantics for `regulation.dat`** where **TYPES** is 'Regulation-of-Transcription-Initiation'
BioCyc Attribute	Warehouse Semantics
ASSOCIATED-BINDING-SITE	Ignored.
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of the `Protein` referenced by the REGULATOR attribute.
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of the `Protein` referenced by the REGULATOR attribute.
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of the `Protein` referenced by the REGULATOR attribute; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
MODE	Ignored.
REGULATED-ENTITY	References the UNIQUE-ID of a promoter `Feature` that is a component of a transcription unit.
REGULATOR	References the UNIQUE-ID of a protein that is a transcription factor.
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of the `Protein` referenced by the REGULATOR attribute.
TYPES	'Regulation-of-Transcription-Initiation'. Determines whether this entry is translated as transcriptional or enzymatic regulation.
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of the `Protein` referenced by the REGULATOR attribute.

For each entry that describes enzymatic regulation, a row is added to EnzReactionInhibitorActivator. Entry attributes determine the column values as described in the table below.

Translation semantics for regulation.dat where TYPES is 'Regulation-of-Enzyme-Activity'

BioCyc Attribute Warehouse Semantics

CITATIONS[*] Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-".
A row is added to CitationWIDOtherWID;
CitationWIDOtherWID.CitationWID is the WID of the Citation associated with this unique ID;
CitationWIDOtherWID.OtherWID is the WID of the Chemical or Protein referenced by the REGULATOR attribute.

COMMENT[*] CommentTable.Comm;
CommentTable.OtherWID is the WID of the Chemical referenced by the REGULATOR attribute.

DBLINKS[*] Attribute is a list for two or more elements. A row is added to CrossReference;
CrossReference.OtherWID is the WID of the Chemical or Protein referenced by the REGULATOR attribute;
CrossReference.DatabaseName is the first element of the list;
CrossReference.XID is the second element of the list;
the rest of the list is ignored.

MECHANISM EnzReactionInhibitorActivator.Mechanism is

'A' if MECHANISM is ':ALLOSTERIC',
'C' if MECHANISM is ':COMPETITIVE',
'I' if MECHANISM is ':IRREVERSIBLE'
'N' if MECHANISM is ':NONCOMPETITIVE' or ':UNCOMPETITIVE',
'U' if MECHANISM is ':UNKMECH',
NULL otherwise.

MODE EnzReactionInhibitorActivator.InhibitOrActivate is

'A' if MODE is '+',

'I' if MODE is '-', and

NULL otherwise.

PHYSIOLOGICALLY-RELEVANT? EnzReactionInhibitorActivator.PhysioRelevant is

'T' if MODE is 'T',

'F' otherwise.

REGULATED-ENTITY References the UNIQUE-ID of an enzymatic reaction.
Its WID is EnzReactionInhibitorActivator.EnzymaticReactionWID.

REGULATOR The Chemical table is queried for a UNIQUE_ID or a COMMON_NAME of a compound in the database being loaded that matches the value of this attribute. If none is found, a row is added to Chemical; the attribute value is stored as Chemical.Name its Chemical.WID is EnzReactionInhibitorActivator.CompoundWID

SYNONYMS[*] SynonymTable.Syn;
SynonymTable.OtherWID is the WID of the Chemical or Protein referenced by the REGULATOR attribute.

TYPES 'Regulation-of-Enzyme-Activity'. Determines whether this entry is translated as transcriptional or enzymatic regulation.

UNIQUE-ID DBID.XID;
DBID.OtherWID is the WID of the Chemical or Protein referenced by the REGULATOR attribute.

**Translation semantics for `regulation.dat`** where **TYPES** is 'Regulation-of-Enzyme-Activity'
BioCyc Attribute	Warehouse Semantics
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of the `Chemical` or `Protein` referenced by the REGULATOR attribute.
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of the `Chemical` referenced by the REGULATOR attribute.
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of the `Chemical` or `Protein` referenced by the REGULATOR attribute; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
MECHANISM	`EnzReactionInhibitorActivator.Mechanism` is 'A' if MECHANISM is ':ALLOSTERIC', 'C' if MECHANISM is ':COMPETITIVE', 'I' if MECHANISM is ':IRREVERSIBLE' 'N' if MECHANISM is ':NONCOMPETITIVE' or ':UNCOMPETITIVE', 'U' if MECHANISM is ':UNKMECH', NULL otherwise.
MODE	`EnzReactionInhibitorActivator.InhibitOrActivate` is 'A' if MODE is '+', 'I' if MODE is '-', and NULL otherwise.
PHYSIOLOGICALLY-RELEVANT?	`EnzReactionInhibitorActivator.PhysioRelevant` is 'T' if MODE is 'T', 'F' otherwise.
REGULATED-ENTITY	References the UNIQUE-ID of an enzymatic reaction. Its WID is `EnzReactionInhibitorActivator.EnzymaticReactionWID`.
REGULATOR	The `Chemical` table is queried for a UNIQUE_ID or a COMMON_NAME of a compound in the database being loaded that matches the value of this attribute. If none is found, a row is added to `Chemical`; the attribute value is stored as `Chemical.Name` its `Chemical.WID` is `EnzReactionInhibitorActivator.CompoundWID`
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of the `Chemical` or `Protein` referenced by the REGULATOR attribute.
TYPES	'Regulation-of-Enzyme-Activity'. Determines whether this entry is translated as transcriptional or enzymatic regulation.
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of the `Chemical` or `Protein` referenced by the REGULATOR attribute.

Pathways

Pathways are input from the file pathways.dat. A row is added to the Pathway table for each entry in pathways.dat. The Pathway.Type column value of each row is set to 'O' to signify the pathway is from a real organism. Pathway.BioSourceWID is the WID from the one row of BioSource created by the loader for the species being loaded.

Pathway entries can reference other pathways (using their UNIQUE-ID), and there is no guarantee that a pathway entry will be defined before a reference to it occurs. The loader adds a row to Pathway and assigns it a WID upon the first reference to a pathway, and performs a SQL UPDATE of the row when its entry is fully defined.

**Translation semantics for `pathways.dat`**
BioCyc Attribute	Warehouse Semantics
CITATIONS[*]	Each attribute is either an evidence code or the UNIQUE-ID of a publication. See Support Table for translation of evidence codes. Each publication UNIQUE-ID is possibly enclosed in square brackets, and possibly missing the leading "PUB-". A row is added to `CitationWIDOtherWID`; `CitationWIDOtherWID.CitationWID` is the WID of the `Citation` associated with this unique ID; `CitationWIDOtherWID.OtherWID` is the WID of this `Pathway`
COMMENT[*]	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `Pathway` object
COMMON-NAME	`Pathway.Name`
DBLINKS[*]	Attribute is a list for two or more elements. A row is added to `CrossReference`; `CrossReference.OtherWID` is the WID of this `Pathway` object; `CrossReference.DatabaseName` is the first element of the list; `CrossReference.XID` is the second element of the list; the rest of the list is ignored.
HYPOTHETICAL-REACTIONS[*]	Value should match the UNIQUE_ID of a reaction. If so, `PathwayReaction.Hypothetical` is 'T' for the row added to `PathwayReaction` for the reaction.
NET-REACTION-EQUATION	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `Pathway` object
PATHWAY-INTERACTIONS	`CommentTable.Comm`; `CommentTable.OtherWID` is the WID of this `Pathway` object
PATHWAY-LINKS[*]	Indicates pathways that are linked via a common substrate. If value is of form (Unique-ID) it is probably a pathway reference and is ignored. Otherwise value is of form (Compound PathwaySpec1 ... PathwaySpecN) where each PathwaySpec is either a descriptor or (descriptor . direction). The direction is ignored. The descriptor may be either a quoted string or a UNIQUE-ID of some BioCyc object (not necessarily a pathway). If the descriptor is anything other than a UNIQUE-ID for a previously defined pathway, a row is added to `Pathway`: `Pathway.Name` is the descriptor. A row is added to `PathwayLink` for each PathwaySpec: `PathwayLink.ChemicalWID` is the WID for the Compound; `PathwayLink.Pathway1WID` is the WID of this `Pathway` object; `PathwayLink.Pathway2WID` is the WID of the linked `Pathway`. During postprocessing, any `Pathway` rows that were added that were not actually pathways (i.e., no pathway entry was later encountered for that descriptor) are deleted from `Pathway`, along with any linked `PathwayLink` rows.
PREDECESSORS[*]	Collectively, these specify the graph of reactions that form the pathway. Each value is of one of two forms: (Successor Predecessor1 ... PredecessorN); zero predecessors are allowed Predecessor-PathwayID For case 1: A row is added to `PathwayReaction` for each Predecessor. For each row: `PathwayReaction.PathwayWID` is the WID of this `Pathway` object; `PathwayReaction.ReactionWID` is the `Reaction` WID of Successor; `PathwayReaction.Hypothetical` is 'F', unless successor is named as a HYPOTHETICAL-REACTIONS attribute; `PathwayReaction.PriorReactionWID` is the `Reaction` WID of Predecessor, or NULL if there are none. For case 2: Each such attribute should also occur as a SUB-PATHWAYS attribute. If it does not, it is ignored.
REACTION-LIST[*]	Each attribute is a UNIQUE-ID of a reaction or pathway. Pathways occurring here are ignored. For each reaction occurring here, but not occurring as a Successor in a PREDECESSORS attribute, a row is added to `PathwayReaction`: `PathwayReaction.PathwayWID` is the WID of this `Pathway` object; `PathwayReaction.ReactionWID` is the `Reaction` WID of the attribute; `PathwayReaction.Hypothetical` is 'F', unless the attribute is named as a HYPOTHETICAL-REACTIONS attribute; `PathwayReaction.PriorReactionWID` is NULL.
SUB-PATHWAYS[*]	This pathway inherits the reaction graph of the pathway whose UNIQUE-ID equals the attribute. That is, a `PathwayReaction` row is added for this pathway for each `PathwayReaction` row of the pathway designated by the attribute. The columns of each `PathwayReaction` row are identical, except that `PathwayWID` is changed from the attribute's pathway WID to the WID of this pathway. Note: sub/superpathway information is loaded via SUPER-PATHWAYS.
SUPER-PATHWAYS[*]	Value should match the UNIQUE_ID of a pathway. If so, a row is added to `SuperPathway`: `SuperPathway.SuperPathwayWID` is the WID associated with this UNIQUE_ID; `SuperPathway.PathwayWID` is the WID of this `Pathway` object.
SYNONYMS[*]	`SynonymTable.Syn`; `SynonymTable.OtherWID` is the WID of this `Pathway` object
UNIQUE-ID	`DBID.XID`; `DBID.OtherWID` is the WID of this `Pathway` object

Additional Tables

Rows are added to the tables SuperPathway, PathwayReaction, and PathwayLink as specified above.

Support Table

Most BioCyc entries allow a CITATIONS attribute to be specified. If this attribute begins with a colon or contains the string ':EV-' anywhere, it is treated as an indicator of evidence for the validity of the associated entry, or some portion of it. The evidence code is the text between the colon and the next colon (if present) or the end of the attribute (excluding the colons).

For each attribute of this form, a row in the Support table is created as follows:

Column values for Support row

Column Value assigned by BioCyc loader

WID The BioWarehouse ID allocated for this Support.

OtherWID The WID of the entry this supporting evidence applies to.

Type The evidence code (e.g., 'EV-EXP-IMP-POLAR-MUTATION').
Note: this is not consistent with the schema documentation, which states this column is either 'computational or 'experimental'.

Confidence NULL

DatasetWID The value Dataset.WID assigned to the dataset being loaded.

**Column values for `Support` row**
Column	Value assigned by BioCyc loader
`WID`	The BioWarehouse ID allocated for this Support.
`OtherWID`	The WID of the entry this supporting evidence applies to.
`Type`	The evidence code (e.g., 'EV-EXP-IMP-POLAR-MUTATION'). Note: this is not consistent with the schema documentation, which states this column is either 'computational or 'experimental'.
`Confidence`	NULL
`DatasetWID`	The value `Dataset.WID` assigned to the dataset being loaded.

Additional Tables

If the CITATIONS attribute contains a reference to a publication ID, a row is added to the CitationWIDOtherWID to associate the Support row with the Citation of the publication.

Entry Table

For each object loaded from the database, a row in the Entry table is created as follows:

Column values for Entry row

Column Value assigned by BioCyc loader

OtherWID The WID of the entry described by this row. Entry may be in
Chemical, Reaction, Protein, Gene, EnzymaticReaction, or Pathway.

InsertDate The time/date the loader was run.

CreationDate NULL

ModifiedDate NULL

LineNumber The line number from the input file on which this entry began.

LoadError 'T' if a parse error is detected, 'F' otherwise.

DatasetWID The value Dataset.WID assigned to the dataset being loaded.

**Column values for `Entry` row**
Column	Value assigned by BioCyc loader
`OtherWID`	The WID of the entry described by this row. Entry may be in `Chemical`, `Reaction`, `Protein`, `Gene`, `EnzymaticReaction`, or `Pathway`.
`InsertDate`	The time/date the loader was run.
`CreationDate`	NULL
`ModifiedDate`	NULL
`LineNumber`	The line number from the input file on which this entry began.
`LoadError`	'T' if a parse error is detected, 'F' otherwise.
`DatasetWID`	The value `Dataset.WID` assigned to the dataset being loaded.

References

BioSPICE Web Site

BioCyc.org

Pathway/Genome Databases (PGDBs)

Schema documentation