Introduction
(C) 2004-2009 SRI International. All Rights Reserved. See BioWarehouse Overview for license details.
This document describes version 4.6 of the KEGG Loader. It is one of several database loaders comprising the BioWarehouse.
KEGG (the Kyoto Encyclopedia of Genes and Genomes) is a collection of databases curated by the Bioinformtics Center at the Institute for Chemical Research at Kyoto University. KEGG is available online at http://www.genome.ad.jp/kegg/. KEGG contains five types of data:
KEGG contains three major components:
LIGAND was originally started by Takaaki Nishioka, and is now maintained in collaboration with the KEGG project. LIGAND itself is a compound of three databases:
This document describes the semantic mapping between the KEGG database components PATHWAY, GENES and LIGAND to a representation in the BioWarehouse. A chapter is dedicated to each of the KEGG components, defining the mapping to the BioWarehouse schema.
Constant tables specify scientific data such as information from the Periodic Table of Elements, as well as constants used as column values in various warehouse tables.
Object tables describe a type of entity in a source database, such as compounds and proteins. Each column of an object table specifies a parameter that characterizes the object. In addition to the parameters defined by the source database, the loader assigns a unique warehouse ID (WID) to each object, which is used by other tables to reference the object.
A special type of warehouse object is the dataset. A dataset object is created for each dataset loaded into the warehouse, i.e., the SWISS-PROT loader adds one row to this table when it is run. Its WID is referred to as the dataset WID and is a column in each object table, specifying the source database of the object.
A linking table describes relationships among objects. They contain WIDs of the associated objects, and any additional columns needed to characterize the relationship. In general, many-to-many relationships are supported. Special tables exist to capture reference and crossreference information and to facilitate lookup of objects.
Full schema information, including source files and browseable documentation, is available with this distribution.
The latest supported data version for the KEGG loader is listed in the loader summary table. The loader may not be compatible with future versions of KEGG. KEGG does not seem to include a current version number in their download, and is not displayed prominently on their website, but some version and release information can be found.
The loader does not load any data from the PATHWAY component of KEGG.
The loader ignores the MASS keyword on compounds, though it could
load this into Chemical.MolecularWeightCalc
.
All data loaded by the loader are loaded as a single dataset in the
warehouse. References from one part of KEGG to another (e.g. chemicals
used in a reaction) are resolved to the wid within the dataset
References to KEGG data that are not loaded
use the CrossReference
table to associate the data that is loaded
with the data that is not loaded.
OtherWID | The WID assigned to the loaded data. |
XID | KEGG accession for the data that is not loaded |
DatabaseName | Abbreviated name of the KEGG component (e.g., 'KO' for KEGG Orthology). |
CrossWID | NULL. |
Each loaded version of KEGG will be assigned a new row in the DataSet
table as follows:
WID | The next available WID in the warehouse. | |
Name | "KEGG" | |
Version | The version numberassigned by KEGG to this release, e.g. "34''. | |
ReleaseDate | The date that this version of KEGG was released. | |
LoadDate | The time/date the loader was run (SQL `SYSDATE'). | |
ReleaseDate | NULL. | |
ChangeDate |
The date and time the loader completed, NULL if the loader did not complete successfully. | |
LoadedBy |
The value of the system environment variable USER for the account running the loader. | |
Application |
'KEGG Loader' | |
ApplicationVersion |
4.6 | |
HomeURL | http://www.genome.ad.jp/. | |
QueryURL | NULL |
All entities that are assigned a WID (other than the DataSet above) are also given an Entry row:
OtherWID | The WID assigned to the entity. |
InsertDate | The time/date the loader was run. |
CreationDate | NULL. |
ModifiedDate | NULL. |
LoadError | "T'' if a parse error is detected, "F'' otherwise. |
DatasetWID | The WID assigned to the DataSet (see above). |
The LoadError field is set to true if any error occured in loading the record from the source database. The granularity is based on the source record-i.e. if there was an error on one line of the record, all warehouse entries derived from that record will have the LoadError flag set true.
The KEGG PATHWAY component is a graphical structure combining the other parts of KEGG. The distributed data contains images and HTML image maps to allow for a convenient visual interaction with the data from LIGAND and GENES. The information it provides above that in LIGAND and GENES, is which reactions occur in which organisms.
PATHWAY is not currently loaded into the warehouse.
The Genome database contains descriptions of the organisms whose genomes are present in the GENES database. Information represented includes the organism name, the abbreviation used in KEGG, the categorization of the organism, high-level information about its genome and citations of the source of the information.
The present loading of this file ignores statistical and map/catalog information. The ignored fields are:
In addition, several fields are loaded strictly as comments, without any semantic interpretation of their contents:
Each entry in GENOME begins with an ENTRY field, giving the
abbreviation
used in KEGG for the organism, e.g. `hin'
for `H.influenzae'. This is
stored in the DBID
table:
OtherWID | The BioSource.WID assigned to
this organism (see GENOME Name below). |
XID | The three-character organism abbreviation. |
The name entry gives the scientific name for the organism. This is
used
to populate the BioSource
table:
WID | A new WID assigned to this object. |
Name | The organism name. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
all other columns | NULL. |
The chromosome entry optionally gives the circularity (`Circular' or
`Linear'), for initial population
of the NucleicAcid
table, and optionally
a chromosome name (in organisms with multiple chromosomes).
WID | A new WID assigned to this object. |
Name | See SEQUENCE below. |
Type | "DNA''. |
Class | "chromosome''. |
Topology | "circular'' if Circular, "linear'' if Linear, NULL if not specified. |
MoleculeLength | See LENGTH below. |
GeneticCodeWID | The GeneticCode.WID associated
with the genetic code (see SEQUENCE
below). |
BioSourceWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
The plasmid entry gives the name of the plasmid and (optionally) if
it is
circular. This is used for initial population of the NucleicAcid
table:
WID | A new WID assigned to this object. |
Name | See SEQUENCE below. |
Type | "DNA''. |
Class | "plasmid''. |
Topology | "circular'' if Circular, "linear'' if Linear, NULL if not specified. |
MoleculeLength | See LENGTH below. |
GeneticCodeWID | The GeneticCode.WID associated
with the genetic code (see SEQUENCE
below). |
BioSourceWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
The sequence item gives the Genbank accession number for the chromosome or plasmid, and (optionally) the genetic code number used.
The replicon name is constructed from
the accession number and the replicon type, (e.g. "Chromosome
GB:L77117''
for the chromosome of Methanococcus jannaschii DSM2661)
and is stored in NucleicAcid.Name
.
If the NCBI
Taxonomy Loader
has been run, the loader will use
the genetic code number to find the associated entry in the GeneticCode
table,
and store it in NucleicAcid.GeneticCodeWID
for this
replicon.NOTE:
As of approximately version 27.0 of KEGG, genetic codes do not seem to
be provided in the data,
so this column will not be populated.
The Genbank accession number is also stored in the CrossReference
table:
OtherWID | The WID assigned to this NucleicAcid
(see above). |
XID | The Genbank accession number. |
DatasetWID | NULL. |
DatabaseName | "GENBANK'' |
The length entry gives the number of nucleotides in the replicon,
and
populates NucleicAcid.MoleculeLength
.
Each entry in the genome file gives one or more citations to the
literature, contained in the fields REFERENCE (giving the
Pubmed ID), AUTHORS, TITLE, and JOURNAL.
These are used to populate
the Citation
table:
WID | A new WID assigned to this citation. |
Citation | The concatenation of the AUTHORS, TITLE, and JOURNAL entries. |
PMID | The Pubmed ID from the REFERENCE entry. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Two entries are made in CitationWIDOtherWID to relate the citation
back to the BioSource
and to the NucleicAcid
:
OtherWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
CitationWID | The WID of the citation |
OtherWID | The NucleicAcid.WID of the
replicon (see CHROMOSOME or PLASMID above). |
CitationWID | The WID of the citation |
The GENES database contains information on the genome of particular organisms, one organism per file. The information includes the name(s) of the gene, its position, the codon usage, amino acid sequence and nucleotide sequence.
An entry in GENES contains up to nine fields.
The ENTRY line gives the gene id and the organism name. The gene id
is
used in the Gene
table (see NAME
below). The organism name is
used to lookup the previously loaded organism from GENOME. If the
organism
is found, a row is created in the BioSourceWIDGeneWID
table:
BioSourceWID | The WID of the organism (see GENOME NAME above). |
GeneWID | The WID assigned to this gene. |
The first name given is assumed to be the primary name, and other
names
are synonyms. The name starts populating the Gene
table:
WID | A new WID assigned to this object. |
Name | The primary name of the gene. |
GenomeID | The gene id (from ENTRY above). |
CodingRegionStart | See POSITION below. |
CodingRegionEnd | See POSITION below. |
Interrupted | See POSITION below. |
NucleicAcidWID | NucleicAcid.WID of the replicon
this gene resides on (see GENOME
Chromosome and GENOME Plasmid
above). |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Alternate names are stored in SynonymTable
:
OtherWID | The WID assigned to this gene. |
Syn | The alternative name |
The definition is stored in CommentTable
:
OtherWID | The WID assigned to this gene. |
Comm | The definition text. |
The position of the gene can be simply a numerical range, a join (patching together a number of regions), a complement, a range relative to other genes, and also indicate on which replicon the gene resides.
We presently ignore the non-numerical range information. Joins are considered to range from the start of the low range, to the end of the high range and the Interrupted flag is set to `T'.
The Gene entry from above is thereby extended with:
WID | ... |
Name | ... |
GenomeID | ... |
CodingRegionStart | The low end of the numerical range(s). |
CodingRegionEnd | The high end of the numerical range(s). |
Direction | `F' for forward, `R' for complement. |
Interrupted | `T' if a join was present, `F' otherwise. |
DataSetWID | ... |
References to the replicon on which the gene resides are represented
in the GeneWIDRepliconWID
table:
GeneWID | The WID of the Gene. |
RepliconWID | The WID of the Replicon. |
CrossReference
table:
OtherWID | The WID assigned to this compound (see above). |
XID | The external database identifier. |
DatasetWID | NULL. |
DatabaseName | The external database name. |
The AASEQ item gives the amino acid sequence for the protein generated by this gene. This is used to complete the AASEQUENCE in the relevant protein (see NAME below).
WID | ... |
Name | ... |
AASequence | The given sequence. |
Charge | ... |
Fragment | ... |
MolecularWeightCalc | ... |
MolecularWeightExp | ... |
PlCalc | ... |
PlExp | ... |
DataSetWID | ... |
The COMPOUND section of LIGAND is a collection of metabolic compounds including substrates, products and inhibitors. Each of the chemicals referenced in the ENZYME and KEGG PATHWAY components is represented in this component. Information represented includes the naming, chemical formula, structural information, metabolic pathways, related enzymes, related protein structures, prosthetic groups and the CAS registry number.
In our semantic mapping, we ignore the information representing the structural information, as there is no current table for this in the BioWarehouse schema.
This section describes how each of the fields in a COMPOUND entry is mapped into the BioWarehouse schema.
Each data item begins with an ENTRY field, giving the compound
accession
number for the LIGAND database. The accession number is stored in the
DBID
table:
OtherWID | The WID assigned to this chemical (see below). |
XID | The accession number |
The name item contains the recommended name for the compound, and
optionally some alternatives. The recommended name is always first, as
is
mandatory. This item starts populating the Chemical
table:
WID | A new WID assigned to this object. |
Name | The recommended name. |
BeilsteinName | NULL. |
SystematicName | NULL. |
CAS | NULL. |
Charge | NULL. |
EmpiricalFormula | See below. |
MolecularWeightCalc | NULL. |
MolecularWeightExp | NULL. |
OctH20PartitionCoeff | NULL. |
PKA1 | NULL. |
PKA2 | NULL. |
PKA3 | NULL. |
WaterSolubility | NULL. |
Smiles | NULL. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Alternative names are each stored in SynonymTable
:
OtherWID | The WID assigned to this chemical. |
Syn | The alternative name |
The formula item is an ascii representation of the chemical formula
of
this compound, e.g. H2O
, C10H16N5O13P3
.
This is used to
populate Chemical.EmpiricalFormula
.
The pathway item is a cross-link to the KEGG PATHWAY data, and
consists of
the pathway map accession number, followed by the description. This is
used to populate the CrossReference
table:
OtherWID | The WID assigned to this chemical (see above). |
XID | The pathway accession number. |
DatasetWID | NULL. |
DatabaseName | "KEGG PATHWAY'' |
The enzyme item is a cross-link to the KEGG ENZYME data, and consists of the EC number, followed by a type indicating how the compound is related to the enzyme. Valid types are R for reactant, I for inhibitor, C for cofactor and E for effector.
Rather than load this data from COMPOUND, this information is loaded from ENZYME, where it is redundantly replicated in KEGG.
The structures item is a cross-link to PDB-the Protein Data
Bank-which
stores the three dimensional structure information for proteins. This
is
used to populate the CrossReference
table:
OtherWID | The WID assigned to this compound (see above). |
XID | The PDB ID. |
DatasetWID | NULL. |
DatabaseName | "PDB'' |
The dblinks item contains cross-link information to other databases.
This
is used to populate the CrossReference
table:
OtherWID | The WID assigned to this compound (see above). |
XID | The external database identifier. |
DatasetWID | NULL. |
DatabaseName | The external database name. |
This section is ignored.
A row is added to the CommentTable
table for each
comment:
OtherWID | The Chemical WID assigned to this
compound. |
Comm | The comment string. |
If the remark item begins with 'Same as: ', the remainder of the item is
assumed to be an accession number. It is associated internally with this compound,
and subsequent references to this accession number in
REACTION are treated as if they were references to this compound.
Also, a row is added to the SynonymTable
table for each remark:
OtherWID | The WID assigned to this compound. |
Syn | The text that occurs after 'Same as: ' |
The REACTION section of LIGAND defines a collection of chemical reactions.
These reactions are referenced in the ENZYME component of KEGG by their accession number. The great majority of reactions in ENZYME are defined in REACTION, but some are not. Furthermore, REACTION can define non-enzymatic reactions which are not referenced there.
This section describes how each of the items in a REACTION entry is
mapped into the BioWarehouse schema. The principal mapping is to the Reaction
table;
one or more rows are added to it based on this entry.
Most reactions have an ENZYME item,
which specifies one or more Enzyme Commission (EC) numbers. For these entries,
a Reaction
row is created for each EC number.
Partial EC numbers (those containing a dash) such as 1.2.3.- or 6.-.-.-
are treated only as synonyms, and never populate one of the EC number columns.
If the EC number is non-partial it is stored in Reaction.ECNumberProposed
.
But when the
ENZYME
is loaded, if this reaction's accession number is mentioned in its
REACTION item, that EC number becomes
Reaction.ECNumber
, and
ECNumberProposed
becomes NULL,
effectively making the EC number official.
WID | A new WID assigned to this reaction. |
Name | The first name in the NAME section; NULL if absent. |
DeltaG | NULL. |
ECNumber | NULL. This may be updated when an ENZYME refreencing this reaction is loaded. |
ECNumberProposed | The EC number from the ENZYME item if it is present, else NULL. Multiple EC numbers in ENZYME cause multiple Reaction rows to be loaded. This may be updated to NULL during the load of ENZYME. |
Spontaneous | NULL. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
If multiple reactions are added, all links defined below are replicated for each Reaction
row added.
Each data item begins with an ENTRY field, giving the reaction accession
number for the LIGAND database. If the ENZYME field is present and refers to multiple EC numbers,
there is no unique identifier for this Reaction
, and no DBID
row is loaded.
Otherwise the accession number is stored in the DBID
table:
OtherWID | The WID assigned to this reaction (see below). |
XID | The accession number |
The name item contains an optional list of names for the reaction.
The recommended name, if present, is always first, followed by alternatives.
This is stored in Reaction.Name
.
Alternative names are each stored in SynonymTable
:
OtherWID | The WID assigned to this reaction. |
Syn | The alternative name |
A row is added to the Description
table for this item. It is the textual depiction of the reaction.
It is used in the processing of the
ENZYME section.
OtherWID | The WID assigned to this reaction (see above). |
Comm | The definition. |
Table | 'Reaction' |
This item, when present, specifies one or more EC numbers associated with the reaction.
As described above, a row is added to Reaction
for each EC number mentioned in this item.
If this item is absent, a single row is added with both ECNumber
and ECNumberProposed
as NULL.
Each EC number, whether full or partial, is stored in the SynonymTable
:
OtherWID | The WID assigned to this reaction. |
Syn | The EC number. |
If the coefficient is absent, is is assumed to be 1. An explicit coefficient is either an integer or an expression such as (n), (m-1)or (n+m). The loader recognizes most parenthezized expressions, as this is the standard syntax for KEGG. It also recognizes some unparenthesized expressions such as n-1, but has a limitation that it does not recognize all such unparenthesized expressions.
The loader loads the compounds on the left side of the equation into the Reactant
table.
The loader loads the compounds on the right side of the equation into the Product
table.
Both table entries have identical column definitions:
ReactionWID | The WID of the reaction |
OtherWID | The Chemical.WID assigned to the compound. |
Coefficient | Coefficient of this substrate, or 0 if the coefficient is an expression. |
If multiple reactions are created, multiple links to the reactants and products are added
to Reactant
and Product
from each Reaction
row.
The ORTHOLOGY item contains cross-link information to KEGG Orthology, which is not loaded by the loader.
This is used to populate the CrossReference
table:
OtherWID | The WID assigned to this reaction (see above). |
XID | The KEGG accession number. It generally starts with 'KO'. |
DatasetWID | NULL. |
DatabaseName | The external database name, generally 'KO'. |
The RPAIR item contains cross-link information to KEGG Rpair, which is not loaded by the loader.
This is used to populate the CrossReference
table:
OtherWID | The WID assigned to this reaction (see above). |
XID | The KEGG accession number. It generally starts with 'RP'. |
DatasetWID | NULL. |
DatabaseName | The external database name, generally 'RP'. |
The PATHWAY item contains cross-link information to KEGG Pathway, which is not loaded by the loader.
This is used to populate the CrossReference
table:
OtherWID | The WID assigned to this reaction (see above). |
XID | The KEGG accession number. |
DatasetWID | NULL. |
DatabaseName | The external database name, generally 'PATH'. |
A row is added to the CommentTable
table for each
comment:
OtherWID | The WID assigned to this reaction (see above). |
Comm | The comment string. |
A row is added to the CommentTable
table for each remark:
OtherWID | The WID assigned to this reaction (see above). |
Comm | The remark string. |
The ENZYME section of LIGAND is a collection of all known enzymatic reactions classified according to the nomenclature of the International Union of Biochemistry and Molecular Biology (IUBMB). Some of the entries in this data are taken from the ExPASY ENZYME database (http://expasy.hcuge.ch/sprot/enzyme.html) from the Swiss Institute of Bioinformatics.
Each entry is identified by the EC number, and contains information of naming, chemical reactions, metabloic compounds, metabolic pathways, genes encoding the enzyme (for several organisms), genetic diseases, and links to other databases.
Each data item begins with a mandatory ENTRY field, giving the EC number
for the enzyme. An EC number may be either a full or a partial EC number.
If an EC number is full, it is stored in the Reaction
table
row of each associated reaction (see below). Partial EC numbers are not stored.
Note that if the EC number is partial, it will typically have no EnzymaticReaction
,
Protein,
,
or Pathway
rows associated with it, as partial EC enzymes typically do not specify
any genes in their GENES item. Moreover, entries for partial EC numbers typically cause no
data to be loaded into the BioWarehouse.
The name item contains the recommended name for the enzyme, and
optionally
some alternatives. All names are assumed to refer to proteins, not
ribozymes. The recommended name is always first, and is mandatory. This
item is stored in the Protein
table:
WID | A new WID assigned to this object. |
Name | The recommended name. |
AASequence | NULL. |
Charge | NULL. |
Fragment | NULL. |
MolecularWeightCalc | NULL. |
MolecularWeightExp | NULL. |
PlCalc | NULL. |
PlExp | NULL. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
One copy of the protein is made for each gene which can generate it (see below). The amino acid sequence is completed when loading the gene (see above).
Alternative names are each stored in SynonymTable
:
OtherWID | The WID assigned to this Protein. |
Syn | The alternative name. |
The class item contains the meaning of the EC number, and is mandatory for all entries. There are three elements: the class, subclass and sub-subclass of the enzyme.
The class entry is not currently loaded.
The sysname item contains the systematic name given by the Enzyme
Commission, representing the nature of the chemical reaction. This
is stored as a synonym of the reaction name, in SynonymTable
:
OtherWID | WID of the reaction (see below). |
Syn | The Systematic Name. |
The reaction item specifies one or more chemical reactions subitems, each of which
specifies one or more Reaction
rows to be used in translating various
items of this entry.
Each subitem is in the form of an equation or a text description, followed optionally by a list of accession numbers referring to reactions defined in REACTION. If multiple subitems are speccified, each is preceded by a parenthesized number such as (1).
If the REACTION item is absent, a single reaction is created from the SUBSTRATE and PRODUCT items (see below).
A reaction item is assumed to be a textual description if it contains no blank-delimited equals sign = or double arrow <=>. If a reaction is given in text, the SUBSTRATE and PRODUCT items are used to define the reaction in preference to the REACTION item, which is left uninterpreted and stored as a comment:
OtherWID | The WID assigned to this reaction. |
Comm | The reaction string. |
If the list of reaction accession numbers is absent from a reaction,
each side of the interpreted equation is stored as per the substrate and
product items (see below). The reaction is stored in the Reaction
table:
WID | A new WID assigned to this object. |
DeltaG | NULL. |
ECNumber | NULL. |
ECNumberProposed | The EC Number specified in the ENTRY item, or NULL if it is a partial EC number. |
Spontaneous | NULL. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
If the list of reaction accession numbers is present (the typical case),
no new Reaction
rows are created. Instead, the Reaction
objects created during translation of the
REACTION
section specified by these accession numbers are used in translating this entry.
Also, the presence of an accession number serves to define the EC number
specified in the ENTRY item as an official EC number. The
Reaction.ECNumber
associated with each accession
is updated to be the EC number specified in the ENTRY item, and the
Reaction.ECNumberProposed
associated with it is updated to NULL.
An EnzymaticReaction
entry is also created for every Reaction
specified, with one
copy for each Protein
generated:
WID | A new WID assigned to this object. |
ReactionWID | The WID of the Reaction assigned above. |
ProteinWID | The WID of the Enzyme (see NAME above). |
ComplexWID | NULL. |
ReactionDirectionWID | NULL. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
The substrate item contains the chemical compounds that appear on the left side of the reaction. If the REACTION item is specified and gave an interpretable reaction, the substrate is ignored. Otherwise it is used to construct a reaction as follows.
Each substrate chemical is assigned an entry in the Chemical
table. If
two chemicals occur within KEGG that are textually identical
they
are considered the same entity. For new chemicals (not previously
loaded
from LIGAND COMPOUND), the fields are completed as follows:
WID | A new WID assigned to this object. |
Name | The name of the substrate chemical. |
BeilsteinName | NULL. |
CAS | NULL. |
Charge | NULL. |
EmpiricalFormula | NULL. |
MolecularWeightCalc | NULL. |
MolecularWeightExp | NULL. |
OctH20PartitionCoeff | NULL. |
SystematicName | NULL. |
WaterSolubility | NULL. |
Smiles | NULL. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Each of the substrate chemicals is linked to the reaction with a Reactant
table entry, including the coefficient when specified. If the
coefficient
is not given, it is assumed to be 1:
ReactionWID | The WID of the reaction |
OtherWID | The Chemical.WID assigned to the
substrate |
Coefficient | Coefficient of this substrate. |
The product item contains the chemical compounds that appear on the right side of the reaction. If the REACTION item is specified and gave an interpretable reaction, the product is ignored. Otherwise it is used to construct a reaction as follows.
Each product chemical is assigned an entry in the Chemical
table. If two
chemicals occur within LIGAND ENZYME that are textually identical
within they are considered the same entity. For new chemicals (not
previously loaded from LIGAND COMPOUND), the fields are completed as
follows:
WID | A new WID assigned to this object. |
Name | The name of the product chemical. |
BeilsteinName | NULL. |
CAS | NULL. |
Charge | NULL. |
EmpiricalFormula | NULL. |
MolecularWeightCalc | NULL. |
MolecularWeightExp | NULL. |
OctH20PartitionCoeff | NULL. |
SystematicName | NULL. |
WaterSolubility | NULL. |
Smiles | NULL. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Each of the product chemicals is linked to the reaction with a Product
table entry, including the coefficient when specified. If the
coefficient
is not given, it is assumed to be 1:
ReactionWID | The WID of the reaction |
OtherWID | The Chemical.WID assigned to the
product chemical |
Coefficient | Coefficient of this product. |
The inhibitor item names compounds that inhibit the reaction from
taking
place. Each compound is given an entry in the Chemical
table (subject to
the textual identical conservation, as in substrate/product):
WID | A new WID assigned to this object. |
Name | The name of the inhibitor compound. |
BeilsteinName | NULL. |
CAS | NULL. |
Charge | NULL. |
EmpiricalFormula | NULL. |
MolecularWeightCalc | NULL. |
MolecularWeightExp | NULL. |
OctH20PartitionCoeff | NULL. |
SystematicName | NULL. |
WaterSolubility | NULL. |
Smiles | NULL. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Each of the inhibitors is linked to each of the enzymatic reactions by
the
EnzReactionWIDChemicalWID
table:
EnzymaticReactionWID | The WID of the Enzymatic Reaction (see above). |
ChemicalWID | The WID assigned to the chemical |
InhibitOrActivate | 'I' |
Mechanism | NULL. |
PhysioRelevant | NULL. |
NOTE: As of approximately version 27 of KEGG, cofactor information appears to be missing from the data files. In this case, no cofactor information is loaded.
The cofactor item names compounds that do not appear in the reaction equation, but are described in the comment item as operating as cofactors in the reaction. Each compound is given an entry in the Chemical table (subject to the textual identical conservation, as in substrate/product):
WID | A new WID assigned to this object. |
Name | The name of the cofactor compound. |
BeilsteinName | NULL. |
CAS | NULL. |
Charge | NULL. |
EmpiricalFormula | NULL. |
MolecularWeightCalc | NULL. |
MolecularWeightExp | NULL. |
OctH20PartitionCoeff | NULL. |
SystematicName | NULL. |
WaterSolubility | NULL. |
Smiles | NULL. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Each of the cofactor compounds is linked to each of the enzymatic
reactions with a
EnzReactionCofactor
table entry:
EnzymaticReactionWID | The WID of the enzymatic reaction (see above). |
ChemicalWID | The WID assigned to the cofactor compound. |
Prosthetic | NULL. |
The effector item names compounds that activate the reaction. Each
compound is given an entry in the Chemical
table (subject
to the textual
identical conservation, as in substrate/product):
WID | A new WID assigned to this object. |
Name | The name of the effector compound. |
BeilsteinName | NULL. |
CAS | NULL. |
Charge | NULL. |
EmpiricalFormula | NULL. |
MolecularWeightCalc | NULL. |
MolecularWeightExp | NULL. |
OctH20PartitionCoeff | NULL. |
SystematicName | NULL. |
WaterSolubility | NULL. |
Smiles | NULL. |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
Each of the effectors is linked to each of the enzymatic reactions by
the
EnzReactionWIDChemicalWID
table:
EnzymaticReactionWID | The WID of the Enzymatic Reaction (see above). |
ChemicalWID | The WID assigned to the chemical |
InhibitOrActivate | 'A' |
Mechanism | NULL. |
PhysioRelevant | NULL. |
The comment item contains free form text information commenting on
the
enzyme. This item populates the CommentTable
:
OtherWID | The WID assigned to this enzyme (see NAME above). |
Comm | The comment string. |
There may be several comments associated with each enzyme.
The pathway item is a cross-link to the KEGG PATHWAY data, and consists of the pathway map accession number, followed by the description. As that database is not parseable, this entry is used to associate reactions into pathways.
A reference (sum of organisms) pathway is created, if it does not already exist:
WID | A new WID assigned to this object. |
Name | The given descriptive name of the pathway. |
Type | 'R' (Reference). |
BioSourceWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
The pathway map accession number is stored in the DBID
table:
OtherWID | The WID assigned to this pathway. |
XID | The accession number. |
Reactions are considered distinct if they have different text depictions specified in
their DEFINITION item in REACTION.
Each distinct reaction is then linked to the pathway by adding a row to
the PathwayReaction
table:
PathwayWID | The WID assigned to this pathway. |
ReactionWID | The reaction WID. |
PriorReactionWID | NULL. |
Hypothetical | 'U' (Unknown). |
The genes item is a cross-link to the KEGG gene catalogs, showing the genes in various organisms that encode this enzyme. This is used to create organism specific pathways, and to indicate the number of proteins to generate in loading: one is generated for each gene, as they may have different amino acid sequences.
For each organism with the necessary gene(s) a new pathway is created (if not already present). The BioSource WID is searched from the organisms previously loaded from the Genome data.
WID | A new WID assigned to this object. |
Name | The given descriptive name of the pathway. |
Type | 'O' (Organism). |
BioSourceWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
DataSetWID | The WID assigned to the DataSet (see DataSet table above). |
The pathway map accession number for this pathway is stored in the DBID
table:
OtherWID | The WID assigned to this pathway. |
XID | The accession number. |
And the Enzyme is linked to the BioSource by the BioSourceWIDProteinWID
table:
BioSourceWID | The BioSource.WID assigned to
this organism (see GENOME Name above). |
ProteinWID | The WID assigned to the Enzyme. |
Each reaction is then linked to the new pathway by adding a row to the
PathwayReaction
table,
in the same way as for the reference pathway above:
PathwayWID | The WID assigned to this pathway. |
ReactionWID | The reaction WID. |
PriorReactionWID | NULL. |
Hypothetical | 'U' (Unknown). |
The disease item is a cross-link to OWIM (On-line Mendelian
Inheritance in
Man) database. This is used to populate the CrossReference
table:
OtherWID | The WID assigned to this enzyme (see NAME above). |
XID | The MIM Number. |
DatasetWID | NULL. |
DatabaseName | "MIM'' |
The motif item is a cross-link to the PROSITE database. Each PROSITE
identifier is used to populate the CrossReference
table:
OtherWID | The WID assigned to this enzyme (see NAME above). |
XID | The PROSITE ID. |
DatasetWID | NULL. |
DatabaseName | "PS'' |
The structures item is a cross-link to PDB-the Protein Data
Bank-which
stores the three dimensional structure information for proteins. Each
PDB
identifier is used to populate the CrossReference
table:
OtherWID | The WID assigned to this enzyme (see NAME above). |
XID | The PDB ID. |
DatasetWID | NULL. |
DatabaseName | "PDB'' |
The dblinks item contains cross-link information to other databases,
including the ENZYME Nomenclature database from the Swiss Institute of
Bioinformatics. This is used to populate the CrossReference
table:
OtherWID | The WID assigned to this enzyme (see NAME above). |
XID | The external database identifier. |
DatasetWID | NULL. |
DatabaseName | The external database name. |
Ignored.
Ignored, but see the translation of ORTHOLOGY in REACTION.
Ignored.