(C) 2006 SRI
International. All Rights Reserved. See BioWarehouse
Overview for license details.
Introduction
This document describes version 4.6 of the BioCyc Loader. It is one of several database loaders comprising the BioWarehouse. The BioCyc Loader (referred to simply as the loader), loads a Pathway/Genome Database (PGDB) into the BioWarehouse - a relational database that provides a common representation for diverse bioinformatics databases.
PGDBs are implemented in a frame-based representation system which is implemented in Common Lisp. The loader inputs a textual flat file representation of a PGDB, converts it to the representation expressed in the Bio-SPICE Warehouse Schema, and loads this directly into an instance of the warehouse.
Constant tables specify scientific data such as information from the Periodic Table of Elements, as well as constants used as column values in various warehouse tables.
Object tables describe a type of entity in a source database, such as compounds and proteins. Each column of an object table specifies a parameter that characterizes the object. In addition to the parameters defined by the source database, the loader assigns a unique em> warehouse ID (WID) to each object, which is used by other tables to reference the object.
A special type of warehouse object is the dataset. A dataset object is created for each dataset loaded into the warehouse, i.e., the SWISS-PROT loader adds one row to this table when it is run. Its WID is referred to as the dataset WID and is a column in each object table, specifying the source database of the object.
A linking table describes relationships among objects. They contain WIDs of the associated objects, and any additional columns needed to characterize the relationship. In general, many-to-many relationships are supported. Special tables exist to capture reference and crossreference information and to facilitate lookup of objects.
Schema documentation is available.
The latest supported data version for the BioCyc loader is listed in the loader summary table. Attributes added to the BioCyc schema after this version are not supported.
The loader silently ignores the numerous source attributes that have no analogue in the BioWarehouse.
In the reaction graph specification in the PREDECESSORS attribute of pathways, due to parsing limitations a reaction may have at most two predecessor reactions. Additional reactions are flagged as generic syntax errors.
The loader treats the MetaCyc database as if contained data for a single organism named "MetaCyc", rather than as containing experimentally elucidated data from many organisms.
Many proteins do not have the COMMON-NAME attribute. This is because multifunctional enzymes often have different names depending on which enzymatic function is being referred to. If no common name is specified (and we prefer that no common name be specified if an activity name is appropriate), we use the common name of the corresponding enzymatic-reaction frame (or a concatenation of them separated by / if there there are multiple enzymatic-reactions).
The individual components of a publication -- title, year, etc. -- are not
broken out by the loader into the associated columns of the Citation
table; only the concatenation of all components is loaded into Citation.Citation
.
The TRANSCRIPTION-DIRECTION attribute is ignored for transcription units. In particular, neither it nor the TRANSCRIPTION-DIRECTION of the associated gene is used in computing Feature.StartPosition of Feature.EndPosition for DNA binding sites. This could lead to incorrect values for these columns.
The Multifun Ontology contained in the MetaCyc Ontology dataset is used to populate rows of the
RelatedTerm
table for genes that specify a Multifun type.
The Gene Ontology dataset is used to populate rows of the
RelatedTerm
table for proteins that specify a GO term.
See genes translation
and proteins translation
for details.
The loader is invoked as part of the ChIP-chip loader. When invoked in this manner, a restricted set of data files, with a minimal amount of data, are loaded and merged into a dataset containing other BioWarehouse data as well. Also, the command-line arguments provided to the loader are specified in a properties file. See the ChIP-chip documentation for full details.
The textual representation of a PGDB consists of several ASCII files. A subset of these are used by the BioCyc loader. Furthermore, not all files are present in all PGDBs; in particular, several files are not present for the MetaCyc PGDB. Input files are loaded in the following order:
- pubs.dat [not present for all BioCyc PDDBs]
- compounds.dat
- proteins.dat
- protseq.fasta [not present for all BioCyc PGDBs]
- transunits.dat [not present for all BioCyc PGDBs]
- genes.dat
- promoters.dat [not present for the MetaCyc PGDB]
- terminators.dat [not present for the MetaCyc PGDB]
- dnabindsites.dat [not present for the MetaCyc PGDB]
- reactions.dat
- enzrxns.dat
- regulation.dat
- pathways.dat
See the PGDB flat file format specification for detailed specification on the contents of the input files used by the loader. Most files are in attribute-value format.
DataSet.WID
is used; typically this is the
dataset that was most recently loaded.
If no dataset of this name exists, a warning is issued and one is
created.
Column | Value assigned by BioCyc loader | |
---|---|---|
WID |
A small integer that uniquely identifies this dataset in the warehouse. | |
Name |
'speciesCyc', where species is an abbreviation for the organism represented in the PGDB. | |
Version |
Major version of PGDB that is loaded. | |
ReleaseDate |
The date that this version of the PGDB was released. | |
LoadDate |
The date and time the loader was started. | |
ChangeDate |
The date and time the loader completed, NULL if the loader did not complete successfully. | |
LoadedBy |
The value of the system environment variable USER for the account running the loader. | |
Application |
'BioCyc Loader' | |
ApplicationVersion |
4.6 | |
HomeURL |
http://www.biocyc.org | |
QueryURL |
http://www.biocyc.org:1555 |
The loader adds one row to the BioSource
table as
follows, defining the organism the BioCyc KB describes:
Column | Value assigned by BioCyc loader | |
---|---|---|
WID |
The next available WID in the warehouse. Uniquely specifies this BioSource in the warehouse. | |
Name |
The value specified by the -o command line option, e.g., 'Bacillus subtilis' | |
Strain |
NULL | |
DatasetWID |
The value Dataset.WID assigned
to the dataset being loaded. |
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
AUTHOR[*] | Concatenated together to form the list of
authors of the publication. The order of the authors is preserved. A comma and a space are inserted between each author. Along with other attributes, this is included in the full text of the citation, stored at Citation.Citation . |
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of this Citation
object |
|
MEDLINE-UID | Crossreference.XID ; Crossreference.DatabaseName is 'Medline'; Crossreference.OtherWID is the WID of this Citation
object |
|
PUBMED-ID | Citation.PMID |
|
REFERENT-FRAME | Attribute provides an alternate name for UNIQUE-ID;
either may be used in other files to refer to the publication. Both refer to the same Citation object. Used internally to associate a publication name with a Citation.WID .
|
|
TITLE, SOURCE, YEAR, URL |
Concatenated together in the given order and
appended to the full list of authors derived from the AUTHOR[*]
attributes to form Citation.Citation . Tabs are inserted between each
attribute and the between the author list and TITLE. |
|
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of this Citation
object. Used internally to associate a publication name with a Citation.WID . |
Chemical
table for each entry in compounds.dat.
A Chemical
may be either a single compound or a class of compounds; all
entries from this file are single compounds; hence Chemical.Class
is always 'F'.
Note that rows are also added to Chemical
when translating reactions.dat;
these compounds are classes of compounds, and their Chemical.Class
is 'T'.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
CAS-REGISTRY-NUMBERS | Chemical.CAS |
|
CHARGE | Chemical.Charge |
|
CHEMICAL-FORMULA[*] | All (Element Number) pairs are concatenated
to form Chemical.EmpiricalFormula Ex: (H 2) and (O 1) form 'H2O1' |
|
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation
associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of this Chemical
|
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of this Chemical
object |
|
COMMON-NAME | Chemical.Name |
|
DBLINKS[*] | Attribute is a list for two or more elements.
A row is added to CrossReference ; CrossReference.OtherWID is the WID of this Chemical
object; CrossReference.DatabaseName is the first element
of the list; CrossReference.XID is the second element of the
list; the rest of the list is ignored. |
|
MOLECULAR-WEIGHT | Chemical.MolecularWeightCalc |
|
PKA1 | Chemical.PKA1 |
|
PKA2 | Chemical.PKA2 |
|
PKA3 | Chemical.PKA3 |
|
REGULATES | Ignored. This is the converse of the REGULATOR attribute for an enzymatic regulator. | |
SMILES | Chemical.Smiles |
|
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this Chemical
object |
|
SYSTEMATIC-NAME | Chemical.SystematicName |
|
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of this Chemical
object |
|
ACTIVATORS-UNMECH-OF, INHIBITORS-ALLOSTERIC-OF, INHIBITORS-IRRREVERSIBLE-OF, INHIBITORS-OTHER-OF |
Ignored; these are symmetric analogues to the corresponding attributes INHIBITORS-ALLOSTERIC, etc. of enzrxns.dat. |
Protein
table for each entry in proteins.dat.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation
associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of this Protein
|
|
^COEFFICIENT[*] | Subunit.Coefficient for the
immediately preceding COMPONENTS attribute. |
|
COMMON-NAME | Protein.Name . Many proteins do not have the COMMON-NAME attribute. This is because multifunctional enzymes often have different names depending on which enzymatic function is being referred to. If no common name is specified (and we prefer that no common name be specified if an activity name is appropriate), we use the common name of the corresponding enzymatic-reaction frame (or a concatenation of them separated by / if there there are multiple enzymatic-reactions). |
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of this Protein
object |
|
COMPONENTS[*] | Attribute is a UNIQUE-ID for a
protein. A row is added to Subunit ; Subunit.ComplexWID is the WID of this Protein
object; Subunit.SubunitWID is the WID of the protein
associated with the attribute; Subunit.Coefficient is the value of the
immediately following ^COEFFICIENT attribute, defaulting to 1 if not explicit |
|
DBLINKS[*] | Attribute is a list for two or more elements.
A row is added to CrossReference ; CrossReference.OtherWID is the WID of this Protein
object; CrossReference.DatabaseName is the first element
of the list; CrossReference.XID is the second element of the
list; the rest of the list is ignored. |
|
FEATURES | Ignored | |
GENE | Ignored | |
GO-TERMS[*] | Ignored unless the Gene Ontology dataset has been previously loaded; each term is a DBID.XID from that dataset. A row is added to RelatedTerm : RelatedTerm.TermWID references this term; RelatedTerm.OtherWID is the WID of this Protein object. RelatedTerm.Relationship is 'keyword' |
|
LOCATION[*] | Location.Location ; Location.ProteinWID is the WID of this Protein
object |
|
MOLECULAR-WEIGHT-SEQ | Protein.MolecularWeightCalc |
|
MOLECULAR-WEIGHT-EXP | Protein.MolecularWeightExp |
|
PI | Protein.PICalc |
|
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this Protein
object |
|
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of this Protein
object |
|
ACTIVATORS-UNMECH-OF, COFACTORS-UNMECH-OF, INHIBITORS-UNMECH-OF, INHIBITORS-COMPETITIVE-OF, INHIBITORS-OTHER-OF, PROSTHETIC-GROUPS-OF |
Ignored; these are symmetric analogues to the corresponding attributes INHIBITORS-COMPETITIVE, etc. of enzrxns.dat. |
BioSourceWIDProteinWID
for each protein
entry. BioSourceWIDProteinWID.BioSourceWID
is the WID
from the one row of BioSource
created by the loader for
the species being loaded.
Numerous other table rows are added for proteins, but they are added
when the linked object is parsed. In particular, the GENE
attribute is ignored, and the protein - gene link is created when genes
are parsed.
If the MetaCyc Ontology dataset has been loaded,
a row is added to RelatedTerm
for each GO-TERMS attribute that is a term
in Gene Ontology.
Protein.AASequence
.
TranscriptionUnit
table for each entry in this file.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of this TranscriptionUnit
|
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of this TranscriptionUnit object |
|
COMMON-NAME | TranscriptionUnit.Name If missing, UNIQUE-ID is used in its place. |
|
COMPONENTS[*] | Ignored; a link to this entry is added when the component is loaded. | |
DBLINKS[*] | Attribute is a list for two or more elements.
A row is added to CrossReference ; CrossReference.OtherWID is the WID of this TranscriptionUnit object; CrossReference.DatabaseName is the first element of the list; CrossReference.XID is the second element of the list; the rest of the list is ignored. |
|
EXTENT-UNKNOWN? | Ignored. | |
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this TranscriptionUnit object |
|
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of this TranscriptionUnit
object |
Gene
table for each entry in genes.dat.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of this Gene
|
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of this Gene object |
|
COMMON-NAME | Gene.Name |
|
COMPONENT-OF[*] | If value matches the UNIQUE_ID of a previously loaded transcription unit,
a row is added to TranscriptionUnitComponent : TranscriptionUnitComponent.TranscriptionUnitWID is the WID of the TranscriptionUnit object; TranscriptionUnitComponent.OtherWID is the WID of this Gene object; TranscriptionUnitComponent.Type is 'gene'. |
|
DBLINKS[*] | Attribute is a list for two or more elements.
A row is added to CrossReference ; CrossReference.OtherWID is the WID of this Gene object; CrossReference.DatabaseName is the first element of the list; CrossReference.XID is the second element of the list; the rest of the list is ignored. |
|
INTERRUPTED | Gene.INTERRUPTED |
|
LEFT-END-POSITION | Gene.CodingRegionStart or Gene.CodingRegionEnd , depending on TRANSCRIPTION-DIRECTION |
|
PRODUCT[*] | Value should match the UNIQUE_ID of a protein. If so, a row is added to GeneWIDProteinWID . |
|
RIGHT-END-POSITION | Gene.CodingRegionEnd or Gene.CodingRegionStart , depending on TRANSCRIPTION-DIRECTION |
|
PI | Gene.PICalc |
|
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this Gene object |
|
TRANSCRIPTION-DIRECTION | A value of "+" indicates that LEFT-END-POSITION is stored as Gene.CodingRegionStart and that RIGHT-END-POSITION is stored as Gene.CodingRegionEnd .A value of "-" indicates that LEFT-END-POSITION is stored as Gene.CodingRegionEnd and that RIGHT-END-POSITION is stored as Gene.CodingRegionStart . |
|
TYPES[*] | Ignored unless the MetaCyc Ontology dataset has been previously loaded; each type is a Term.Name from the Multifun subontology of that dataset.
A row is added to RelatedTerm : RelatedTerm.TermWID references a term that is this type. RelatedTerm.OtherWID is the WID of this Gene object. RelatedTerm.Relationship is 'superclass' |
|
UNIQUE-ID | Gene.GenomeID and DBID.XID ; DBID.OtherWID is the WID of this Gene object |
GeneWIDProteinWID
for each PRODUCT
attribute, associating the gene with the gene product.
A row is added to BioSourceWIDGeneWID
for each gene
entry. BioSourceWIDGeneWID.BioSourceWID
is the WID from
the one row of BioSource
created by the loader for the
species being loaded.
A row is added to TranscriptionUnitComponent
when a COMPONENT-OF attribute matches the
UNIQUE-ID of a previously loaded transcription unit.
If the MetaCyc Ontology dataset has been loaded,
a row is added to RelatedTerm
for each TYPES attribute that is a term
in the Multifun ontology that is part of the MetaCyc Ontology.
Feature
table for each entry in this file:
Feature.Type
NULL.
Feature.Class
is 'promoter'.
Feature.SequenceType
is 'N'.
Feature.SequenceWID
is NULL.
Feature.RegionOrPoint
is 'point'.
Feature.PointType
is 'center'.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
ABSOLUTE-PLUS-1-POSITION | Feature.StartPosition and Feature.EndPosition .This is the position at which transcription starts. |
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation
associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of this Feature
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of this Feature object |
|
COMMON-NAME | Feature.Description |
|
COMPONENT-OF[*] | If value matches the UNIQUE_ID of a previously loaded transcription unit,
a row is added to TranscriptionUnitComponent : TranscriptionUnitComponent.TranscriptionUnitWID is the WID of the TranscriptionUnit object; TranscriptionUnitComponent.OtherWID is the WID of this Feature object; TranscriptionUnitComponent.Type is 'promoter'. |
|
DBLINKS[*] | Attribute is a list for two or more elements. A row is added to CrossReference ; CrossReference.OtherWID is the WID of this Feature object; CrossReference.DatabaseName is the first element of the list; CrossReference.XID is the second element of the list; the rest of the list is ignored. |
|
PROMOTER-EVIDENCE | Ignored. There is typically an associated CITATIONS attribute for thie evidence code. | |
REGULATED-BY | Ignored. This is the converse of the REGULATED-ENTITY attribute for a transcriptional regulator. | |
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this Feature object |
|
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of this Feature object |
TranscriptionUnitComponent
for each COMPONENT-OF attribute
that references a previously loaded transcription unit.
Feature
table for each entry in this file:
Feature.Type
NULL.
Feature.Class
is 'terminator'.
Feature.SequenceType
is 'N'.
Feature.SequenceWID
is NULL.
Feature.RegionOrPoint
is 'region'.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation
associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of this Feature
|
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of this Feature object |
|
COMMON-NAME | Feature.Description |
|
COMPONENT-OF[*] | If value matches the UNIQUE_ID of a previously loaded transcription unit,
a row is added to TranscriptionUnitComponent : TranscriptionUnitComponent.TranscriptionUnitWID is the WID of the TranscriptionUnit object; TranscriptionUnitComponent.OtherWID is the WID of this Feature object; TranscriptionUnitComponent.Type is 'terminator'. |
|
DBLINKS[*] | Attribute is a list for two or more elements. A row is added to CrossReference ; CrossReference.OtherWID is the WID of this Feature object; CrossReference.DatabaseName is the first element of the list; CrossReference.XID is the second element of the list; the rest of the list is ignored. |
|
LEFT-END-POSITION | Feature.StartPosition |
|
RIGHT-END-POSITION | Feature.EndPosition |
|
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this Feature object |
|
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of this Feature object |
TranscriptionUnitComponent
for each COMPONENT-OF attribute
that references a previously loaded transcription unit.
Feature
table for each entry in this file:
The TRANSCRIPTION-DIRECTION attribute is ignored for transcription units. In particular, neither it nor the TRANSCRIPTION-DIRECTION of the associated gene is used in computing Feature.StartPosition of Feature.EndPosition for DNA binding sites. This could lead to incorrect values for these columns.
Feature.Type
is NULL.
Feature.Class
is always 'binding site'.
Feature.SequenceType
is 'N'.
Feature.SequenceWID
is NULL.
Feature.RegionOrPoint
is 'point'.
Feature.PointType
is 'center'.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation
associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of this Feature
|
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of this Feature object |
|
COMMON-NAME | Feature.Description |
|
COMPONENT-OF[*] | If value matches the UNIQUE_ID of a previously loaded transcription unit,
a row is added to TranscriptionUnitComponent : TranscriptionUnitComponent.TranscriptionUnitWID is the WID of the TranscriptionUnit object; TranscriptionUnitComponent.OtherWID is the WID of this Feature object; TranscriptionUnitComponent.Type is 'binding site'. |
|
DBLINKS[*] | Attribute is a list for two or more elements. A row is added to CrossReference ; CrossReference.OtherWID is the WID of this Feature object; CrossReference.DatabaseName is the first element of the list; CrossReference.XID is the second element of the list; the rest of the list is ignored. |
|
REGULATED-PROMOTER | References the UNIQUE-ID of a promoter. Assuming that promoter has been previously loaded, its ABSOLUTE-PLUS-1-POSITION is used to convert this binding site's RELATIVE-CENTER-POSITION from a relative to an absolute position. | |
RELATIVE-CENTER-POSITION | This numeric value designates either an integral position, or a position halfway between
two integral positions. First of all, if the attribute is positive, one is subtracted from it.
It is then added to the Feature.StartPosition of the
promoter named in the REGULATED-PROMOTER attribute. If the sum is integral it is stored in both
Feature.StartPosition and Feature.EndPosition . If it is nonintegral,
the next-lowest integer is stored in Feature.StartPosition and
the next-highest integer is stored in Feature.EndPosition . |
|
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this Feature object |
|
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of this Feature
object |
TranscriptionUnitComponent
for each COMPONENT-OF attribute
that references a previously loaded transcription unit.
Reaction
table for each entry in reactions.dat.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
BALANCE-STATE | Ignored | |
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation
associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of this Reaction
|
|
^COEFFICIENT[*] | Reactant.Coefficient or Product.Coefficient
for the immediately preceding LEFT or RIGHT attribute. |
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of this Reaction
object |
|
COMMON-NAME | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this Reaction
object |
|
DBLINKS[*] | Attribute is a list for two or more elements.
A row is added to CrossReference ; CrossReference.OtherWID is the WID of this Reaction
object; CrossReference.DatabaseName is the first element
of the list; CrossReference.XID is the second element of the
list; the rest of the list is ignored. |
|
DELTAG0 | Reaction.DeltaG |
|
EC-NUMBER | Reaction.ECNumber or Reaction.ECNumberProposed ,
depending on OFFICIAL-EC? |
|
OFFICIAL-EC? | Determines whether EC-NUMBER is
stored as Reaction.ECNumber or Reaction.ECNumberProposed |
|
ORPHAN? | Ignored | |
LEFT[*] |
A row is added to Reactant ; its ReactionWID is the WID of this reaction. The attribute designates a substrate or a class of substrates involved in the reaction. It is translated as discussed below (see Additional Tables); the WID for the substrate is stored in Reactant.OtherWID .
If a ^COEFFICIENT attribute follows immediately, it is stored
as Reactant.Coefficient . Otherwise the value 1 is stored. |
|
RIGHT[*] |
A row is added to Product ; its ReactionWID is the WID of this reaction.. The attribute designates a substrate or a class of substrates involved in the reaction. It is translated as discussed below (see Additional Tables); the WID for the substrate is stored in Product.OtherWID .
If a ^COEFFICIENT attribute follows immediately, it is stored
as Product.Coefficient . Otherwise the value 1 is stored. |
|
SPONTANEOUS? | Reaction.Spontaneous |
|
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this Reaction
object |
|
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of this Reaction
object |
Reactant
for each LEFT
attribute and Product
for each RIGHT attribute.
If the attribute matches the
UNIQUE-ID, the COMMON-NAME, or a SYNONYM
of a Chemical
or Protein
, its WID is
stored as Reactant.WID
.
Else it is assumed to specify a class of chemicals;
an entry in Chemical
is created for it, such that Chemical.Name
is the attribute, and Chemical.Class
is 'T'.
EnzymaticReaction
table for each
entry in enzrxns.dat.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
ALTERNATIVE-COFACTORS[*] ALTERNATIVE-SUBSTRATES[*] |
The attribute consists of a list of names (PRIMARY
ALT1 ALT2 ... ALTn). For each name, the Chemical table is queried for a UNIQUE_ID
or a COMMON_NAMEof a compound in the
database being loaded. If none is found, a row is added to Chemical ;
the name is stored as Chemical.Name ; its Chemical.WID
is used as described below. N rows are added to EnzReactionAltCompound , one for each
ALTi: EnzReactionAltCompound.EnzymaticReactionWID is
the WID of this enzymatic reaction; EnzReactionAltCompound.PrimaryWID is the WID
associated with the primary compound; EnzReactionAltCompound.AlternativeWID is the WID
associated with compound ALTi; EnzReactionAltCompound.Cofactor is 'T' for ALTERNATIVE-COFACTORS,
'F' for ALTERNATIVE-SUBSTRATES. |
|
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation
associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of this EnzymaticReaction
|
|
COFACTORS[*] PROSTHETIC-GROUPS[*] COFACTORS-OR-PROSTHETIC-GROUPS[*] |
The Chemical and Protein
tables (in that order) are queried for a UNIQUE_ID or a COMMON_NAMEof
a compound or a
protein in the database being loaded
that matches the value of this attribute. If none is found, a row is
added to Chemical ; the attribute value is stored as Chemical.Name
its Chemical.WID is used as described below. A row is added to EnzReactionCofactor ; EnzReactionCofactor.EnzymaticReactionWID
is the WID of this enzymatic reaction; EnzReactionCofactor.CompoundWID
is the WID associated with the compound or protein; EnzReactionCofactor.Prosthetic
is 'F' COFACTORS, 'T' for PROSTHETIC-GROUPS, and NULL
for COFACTORS-OR-PROSTHETIC-GROUPS. |
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of this EnzymaticReaction
object |
|
COMMON-NAME | Ignored | |
REQUIRED-PROTEIN-COMPLEX | Value should match the UNIQUE_ID of a
protein. If so, the WID of the protein is stored as EnzymaticReaction.ComplexWID .
|
|
DBLINKS[*] | Attribute is a list for two or more elements.
A row is added to CrossReference ; CrossReference.OtherWID is the WID of this EnzymaticReaction
object; CrossReference.DatabaseName is the first element
of the list; CrossReference.XID is the second element of the
list; the rest of the list is ignored. |
|
ENZYME | Required. Value should match the UNIQUE_ID
of a protein. If so, the WID of the protein is stored as EnzymaticReaction.ProteinWID .
|
|
REACTION | Required. Value should match the UNIQUE_ID
of a reaction. If so, the WID of the reaction is stored as EnzymaticReaction.ReactionWID .
|
|
REACTION-DIRECTION | EnzymaticReaction.ReactionDirection .
|
|
REGULATED-BY | Ignored. This is the converse of the REGULATED-ENTITY attribute for a enzymatic regulator. | |
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this Reaction
object |
|
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of this EnzymaticReaction
object |
EnzReactionAltCompound
, EnzReactionCofactor
,
and EnzReactionInhibitorActivator
as noted above.
In transcriptional regulation, the regulator is a protein that is a transcription factor;
the regulated entity is a promoter that is a component of one or more transcription units.
In enzymatic regulation, the regulator is a chemical; the regulated entity is an enzymatic reaction.
Comments, citations, synonyms, and crossreferences of each entry are linked to the regulator.
Regulator proteins and compounds will have multiple DBID
s -- their UNIQUE-ID
from this entry as well as the UNIQUE-ID from the protein
or compound entry.
All regulation is characterized by a mode, indicating whether the process is inhibited or activated. In addition, enzymatic regulation is characterized by a regulation mechanism, as well as a flag indicating physiological relevance.
Note that for transcriptional regulation, no rows are added to the BioWarehouse. It has no representation
of transcription factors. However, the naming conventions of BioCyc database may be exploited to find all
transcription factors by finding all proteins that have a DBID.XID
starting with 'REG'.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
ASSOCIATED-BINDING-SITE | Ignored. | |
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation
associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of the
Protein referenced by the REGULATOR attribute.
|
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of the
Protein referenced by the REGULATOR attribute. |
|
DBLINKS[*] | Attribute is a list for two or more elements. A row is added to CrossReference ; CrossReference.OtherWID is the WID of the Protein referenced by the REGULATOR attribute; CrossReference.DatabaseName is the first element of the list; CrossReference.XID is the second element of the list; the rest of the list is ignored. |
|
MODE | Ignored. | |
REGULATED-ENTITY | References the UNIQUE-ID of a promoter Feature
that is a component of a transcription unit. |
|
REGULATOR | References the UNIQUE-ID of a protein that is a transcription factor. | |
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of the Protein referenced by the REGULATOR attribute. |
|
TYPES | 'Regulation-of-Transcription-Initiation'. Determines whether this entry is translated as transcriptional or enzymatic regulation. | |
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of the Protein referenced by the REGULATOR attribute. |
For each entry that describes enzymatic regulation, a row is added to EnzReactionInhibitorActivator
.
Entry attributes determine the column values as described in the table below.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation
associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of the
Chemical or Protein referenced by the REGULATOR attribute.
|
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of the
Chemical referenced by the REGULATOR attribute. |
|
DBLINKS[*] | Attribute is a list for two or more elements. A row is added to CrossReference ; CrossReference.OtherWID is the WID of the Chemical or Protein referenced by the REGULATOR attribute; CrossReference.DatabaseName is the first element of the list; CrossReference.XID is the second element of the list; the rest of the list is ignored. |
|
MECHANISM |
EnzReactionInhibitorActivator.Mechanism is
|
|
MODE |
EnzReactionInhibitorActivator.InhibitOrActivate is
|
|
PHYSIOLOGICALLY-RELEVANT? |
EnzReactionInhibitorActivator.PhysioRelevant is
|
|
REGULATED-ENTITY | References the UNIQUE-ID of an enzymatic reaction. Its WID is EnzReactionInhibitorActivator.EnzymaticReactionWID .
|
|
REGULATOR | The Chemical table is queried
for a UNIQUE_ID or a COMMON_NAME of a compound in the
database being loaded that matches the value of this attribute. If
none is found, a row is added to Chemical ; the attribute
value is stored as Chemical.Name its
Chemical.WID is
EnzReactionInhibitorActivator.CompoundWID |
|
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of the Chemical or Protein referenced by the REGULATOR attribute. |
|
TYPES | 'Regulation-of-Enzyme-Activity'. Determines whether this entry is translated as transcriptional or enzymatic regulation. | |
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of the Chemical or Protein referenced by the REGULATOR attribute. |
Pathway
table for each entry in pathways.dat.
The Pathway.Type
column value of each row is set to 'O'
to signify the pathway is from a real organism. Pathway.BioSourceWID
is the WID from the one row of BioSource
created by the
loader for the species being loaded.
Pathway entries can reference other pathways
(using their UNIQUE-ID), and there is no guarantee that a
pathway entry will be defined before a reference to it occurs. The
loader adds a row to Pathway
and assigns it a WID upon
the first reference to a pathway, and performs a SQL UPDATE of the row
when its entry is fully defined.
BioCyc Attribute | Warehouse Semantics | |
---|---|---|
CITATIONS[*] | Each attribute is either an evidence code or
the UNIQUE-ID of a publication.
See Support Table for translation of evidence codes.
Each publication UNIQUE-ID
is possibly enclosed in square brackets,
and possibly missing the leading "PUB-". A row is added to CitationWIDOtherWID ; CitationWIDOtherWID.CitationWID is the WID of the Citation
associated with this unique ID; CitationWIDOtherWID.OtherWID is the WID of this Pathway
|
|
COMMENT[*] | CommentTable.Comm ; CommentTable.OtherWID is the WID of this Pathway
object |
|
COMMON-NAME | Pathway.Name |
|
DBLINKS[*] | Attribute is a list for two or more elements.
A row is added to CrossReference ; CrossReference.OtherWID is the WID of this Pathway
object; CrossReference.DatabaseName is the first element
of the list; CrossReference.XID is the second element of the
list; the rest of the list is ignored. |
|
HYPOTHETICAL-REACTIONS[*] | Value should match the UNIQUE_ID of a reaction. If so, PathwayReaction.Hypothetical is 'T' for the row added to PathwayReaction for the reaction. |
|
NET-REACTION-EQUATION | CommentTable.Comm ; CommentTable.OtherWID is the WID of this Pathway
object |
|
PATHWAY-INTERACTIONS | CommentTable.Comm ; CommentTable.OtherWID is the WID of this Pathway object |
|
PATHWAY-LINKS[*] | Indicates pathways that are linked via a common substrate. If value is of form (Unique-ID) it is probably a pathway reference and is ignored. Otherwise value is of form (Compound PathwaySpec1 ... PathwaySpecN) where each PathwaySpec is either a descriptor or (descriptor . direction). The direction is ignored. The descriptor may be either a quoted string or a UNIQUE-ID of some BioCyc object (not necessarily a pathway). If the descriptor is anything other than a UNIQUE-ID for a previously defined pathway, a row is added to Pathway : Pathway.Name is the descriptor. A row is added to PathwayLink for each PathwaySpec: PathwayLink.ChemicalWID is the WID for the Compound; PathwayLink.Pathway1WID is the WID of this Pathway object; PathwayLink.Pathway2WID is the WID of the linked Pathway . During postprocessing, any Pathway rows that were added that were not actually pathways
(i.e., no pathway entry was later encountered for that descriptor) are deleted from Pathway ,
along with any linked PathwayLink rows.
|
|
PREDECESSORS[*] | Collectively, these specify the graph of reactions that form
the pathway. Each value is of one of two forms:
A row is added to PathwayReaction for each Predecessor.
For each row: PathwayReaction.PathwayWID is the WID of this Pathway object; PathwayReaction.ReactionWID is the Reaction WID of Successor; PathwayReaction.Hypothetical is 'F', unless successor is named as a HYPOTHETICAL-REACTIONS attribute; PathwayReaction.PriorReactionWID is the Reaction WID of Predecessor, or NULL if there are none. For case 2: Each such attribute should also occur as a SUB-PATHWAYS attribute. If it does not, it is ignored. |
|
REACTION-LIST[*] | Each attribute is a UNIQUE-ID of a reaction or pathway.
Pathways occurring here are ignored.
For each reaction occurring here, but not occurring as a Successor in
a PREDECESSORS attribute, a row is added to PathwayReaction : PathwayReaction.PathwayWID is the WID of this Pathway object; PathwayReaction.ReactionWID is the Reaction
WID of the attribute; PathwayReaction.Hypothetical is 'F', unless
the attribute is named as a HYPOTHETICAL-REACTIONS attribute; PathwayReaction.PriorReactionWID is NULL.
|
|
SUB-PATHWAYS[*] |
This pathway inherits the reaction graph of the pathway whose
UNIQUE-ID equals the attribute.
That is, a PathwayReaction row is added for this pathway for each
PathwayReaction row of the pathway designated by the attribute.
The columns of each PathwayReaction row are identical,
except that PathwayWID is changed from the
attribute's pathway WID to the WID of this pathway. Note: sub/superpathway information is loaded via SUPER-PATHWAYS. |
|
SUPER-PATHWAYS[*] | Value should match the UNIQUE_ID of a pathway. If so, a row is added to SuperPathway : SuperPathway.SuperPathwayWID is the WID associated with this UNIQUE_ID; SuperPathway.PathwayWID is the WID of this Pathway object. |
|
SYNONYMS[*] | SynonymTable.Syn ; SynonymTable.OtherWID is the WID of this Pathway
object |
|
UNIQUE-ID | DBID.XID ; DBID.OtherWID is the WID of this Pathway
object |
SuperPathway
, PathwayReaction
,
and PathwayLink
as specified above.
For each attribute of this form, a row in the Support
table is created as follows:
Column | Value assigned by BioCyc loader | |
---|---|---|
WID |
The BioWarehouse ID allocated for this Support. | |
OtherWID |
The WID of the entry this supporting evidence applies to. | |
Type |
The evidence code (e.g., 'EV-EXP-IMP-POLAR-MUTATION'). Note: this is not consistent with the schema documentation, which states this column is either 'computational or 'experimental'. |
|
Confidence |
NULL | |
DatasetWID |
The value Dataset.WID assigned to the dataset being loaded. |
CitationWIDOtherWID
to associate the Support
row
with the Citation
of the publication.
Entry
table is created as follows:
Column | Value assigned by BioCyc loader | |
---|---|---|
OtherWID |
The WID of the entry described by this row.
Entry may be in Chemical , Reaction , Protein ,
Gene , EnzymaticReaction , or Pathway .
|
|
InsertDate |
The time/date the loader was run. | |
CreationDate |
NULL | |
ModifiedDate |
NULL | |
LineNumber |
The line number from the input file on which this entry began. | |
LoadError |
'T' if a parse error is detected, 'F' otherwise. | |
DatasetWID |
The value Dataset.WID assigned
to the dataset being loaded. |