Gene Ontology Loader Manual
This documentation describes the
implementation details of the Gene Ontology (GO) loader. The
translation semantics between GO Terms and the BioWarehouse schema are
described below. The GO OBO DTD is shown below the translation
semantics.
GO to BioWarehouse Schema Mappings
For each
<term>
, a entry is created in the
BioWarehouse Term table.
- For each Term, create one entry in DBID
- DBID.OtherWID gets the value of Term.WID
- DBID.XID get the value of
term/id
- DBID.Type is "Accession"
- Term.Name gets the value of
term/name
- If
term/def/defstr
is present, it is stored in
Term.Definition
- For each
term/synonym
, an
entry is created in SynonymTable
- SynonymTable.OtherWID gets the value of Term.WID
- SynonymTable.Syn gets the value of
term/synonym/synonym_text
- For each
term/alt_id
, an
entry is created in SynonymTable
- SynonymTable.OtherWID gets the value of Term.WID
- SynonymTable.Syn gets the value of
term/alt_id
- If a
term/comment
is present,
it is stored in Description
- Description.OtherWID gets the value of Term.WID
- Description.TableName gets the value "Term"
- Description.Comm gets the value of
term/definition
- For each
term/is_a
, an entry is
created in TermRelationship
- TermRelationship.TermWID gets the value of Term.WID
- TermRelationship.RelatedTermWID gets the value of the WID of
the
<term>
identified by the GO identifier in the <is_a>
element
- TermRelationship.Relationship gets the value "is_a"
- For each
term/relationship
, if relationship/type
equals "part_of"
- TermRelationship.TermWID gets the value of Term.WID
- TermRelationship.RelatedTermWID gets the value of the WID of
the
<term>
identified by the GO identifier in the relationship/to
element
- TermRelationship.Relationship gets the value "part_of"
- For each
term/xref_analog
, an entry is
created in CrossReference.
- CrossReference.OtherWID gets the value of Term.WID
- CrossReference.XID gets the value of
xref_analog
/acc
- CrossReference.DatabaseName gets the value of
xref_analog
/dbname.
The meaning of the dbname symbols can be found on the Gene Ontology
site at http://www.geneontology.org/doc/GO.xrf_abbs.
- For each
term/def/dbxref
, an entry is
created in CrossReference.
- CrossReference.OtherWID gets the value of Term.WID
- CrossReference.XID gets the value of
dbxref/acc
- CrossReference.DatabaseName gets the value of
dbxref/dbname
- Note: At this time we only handle the following dbnames:
BRENDA, EMBL, INTERPRO, MeSH, MetaCyc, MetaCyc-Enzyme, OMIM, PDB, PFAM,
RESID, SP, TIGR_TIGRFAMS, UM-BBD_enzymeID, UM-BBD_pathwayID, and
UniProt. Documentation about the meaning of these symbols can be
found on the Gene Ontology site at http://www.geneontology.org/doc/GO.xrf_abbs.
- An exception is made for dbnames: ISSN, ISBN. The
contents of
dbxref/acc
are stored in Citation.Citation
instead of CrossReference. For dbname: PMID, the contents of dbxref/acc
are stored in Citation.PMID.