Gene Ontology Loader Manual

This documentation describes the implementation details of the Gene Ontology (GO) loader.  The translation semantics between GO Terms and the BioWarehouse schema are described below.  The GO OBO DTD is shown below the translation semantics.

GO to BioWarehouse Schema Mappings

For each <term>, a entry is created in the BioWarehouse Term table.
  1. For each Term, create one entry in DBID
    • DBID.OtherWID gets the value of Term.WID
    • DBID.XID get the value of term/id
    • DBID.Type is "Accession"
  2. Term.Name gets the value of term/name
  3. If term/def/defstr is present, it is stored in Term.Definition
  4. For each term/synonym, an entry is created in SynonymTable
    • SynonymTable.OtherWID gets the value of Term.WID
    • SynonymTable.Syn gets the value of term/synonym/synonym_text
  5. For each term/alt_id, an entry is created in SynonymTable
    • SynonymTable.OtherWID gets the value of Term.WID
    • SynonymTable.Syn gets the value of term/alt_id
  6. If a term/comment is present, it is stored in Description
    • Description.OtherWID gets the value of Term.WID
    • Description.TableName gets the value "Term"
    • Description.Comm gets the value of term/definition
  7. For each term/is_a, an entry is created in TermRelationship
    • TermRelationship.TermWID gets the value of Term.WID
    • TermRelationship.RelatedTermWID gets the value of the WID of the <term> identified by the GO identifier in the <is_a> element
    • TermRelationship.Relationship gets the value "is_a"
  8. For each term/relationship, if relationship/type equals "part_of"
    • TermRelationship.TermWID gets the value of Term.WID
    • TermRelationship.RelatedTermWID gets the value of the WID of the <term> identified by the GO identifier in the relationship/to element
    • TermRelationship.Relationship gets the value "part_of"
  9. For each term/xref_analog, an entry is created in CrossReference.
    • CrossReference.OtherWID gets the value of Term.WID
    • CrossReference.XID gets the value of xref_analog/acc
    • CrossReference.DatabaseName gets the value of xref_analog/dbname.  The meaning of the dbname symbols can be found on the Gene Ontology site at http://www.geneontology.org/doc/GO.xrf_abbs.
  10. For each term/def/dbxref, an entry is created in CrossReference.
    • CrossReference.OtherWID gets the value of Term.WID
    • CrossReference.XID gets the value of dbxref/acc
    • CrossReference.DatabaseName gets the value of dbxref/dbname
    • Note: At this time we only handle the following dbnames:  BRENDA, EMBL, INTERPRO, MeSH, MetaCyc, MetaCyc-Enzyme, OMIM, PDB, PFAM, RESID, SP, TIGR_TIGRFAMS, UM-BBD_enzymeID, UM-BBD_pathwayID, and UniProt.  Documentation about the meaning of these symbols can be found on the Gene Ontology site at http://www.geneontology.org/doc/GO.xrf_abbs.
    • An exception is made for dbnames: ISSN, ISBN.  The contents of dbxref/acc are stored in Citation.Citation instead of CrossReference.  For dbname: PMID, the contents of dbxref/acc are stored in Citation.PMID.


go-obo.dtd

1 <!--
2
3 OBO XML
4
5 This is an XML representation of the Obo-text file format, for
6 modeling any GO or OBO ontology
7
8 For an explanation of the meaning of any of the XML elements, see
9 the obo file format documentation
10
11 -->

12
13
<!-- top-level element corresponding to the whole .obo file -->
14
<!ELEMENT obo (source|header|term+|typedef)+>
15
16
<!-- *** FILE METADATA *** -->
17
18
<!-- metadata concerning the source input file -->
19
<!ELEMENT source (source_type|source_path|source_md5|source_mtime)+>
20 <!ELEMENT source_type (#PCDATA)>
21 <!ELEMENT source_path (#PCDATA)>
22 <!ELEMENT source_md5 (#PCDATA)>
23 <!ELEMENT source_mtime (#PCDATA)>
24 <!ELEMENT header (format-version|date|saved-by|auto-generated-by|default-namespace|remark|subsetdef+)*>
25 <!ELEMENT format-version (#PCDATA)>
26 <!ELEMENT date (#PCDATA)>
27 <!ELEMENT saved-by (#PCDATA)>
28 <!ELEMENT auto-generated-by (#PCDATA)>
29 <!ELEMENT default-namespace (#PCDATA)>
30 <!ELEMENT remark (#PCDATA)>
31 <!ELEMENT subsetdef (id|name|dbxref*)+>
32
33
<!-- *** END OF FILE METADATA *** -->
34
35
<!-- TERM -->
36
37
<!ELEMENT term (id|name|namespace|def?|is_a*|alt_id*|subset*|comment?|is_obsolete?|is_root?|xref_analog*|xref_unknown*|synonym*|relationship*|intersection_of*|union_of*)+>
38
39
<!-- TERM ELEMENTS -->
40
41
<!ELEMENT id (#PCDATA)>
42 <!ELEMENT name (#PCDATA)>
43 <!ELEMENT namespace (#PCDATA)>
44 <!ELEMENT def (defstr|dbxref*)+>
45 <!ELEMENT defstr (#PCDATA)>
46
47
<!-- DBXREF ELEMENTS -->
48
<!ELEMENT xref_analog (acc|dbname|name)+>
49 <!ELEMENT xref_unknown (acc|dbname|name)+>
50 <!ELEMENT is_a (#PCDATA)>
51 <!ELEMENT relationship (type|to)+>
52 <!ELEMENT type (#PCDATA)>
53 <!ELEMENT to (#PCDATA)>
54 <!ELEMENT alt_id (#PCDATA)>
55 <!ELEMENT subset (#PCDATA)>
56 <!ELEMENT comment (#PCDATA)>
57
58
<!-- BOOLEANS -->
59
<!ELEMENT is_obsolete (#PCDATA)>
60 <!ELEMENT is_root (#PCDATA)>
61
62 <!ELEMENT synonym (synonym_text|type|dbxref*)+>
63 <!ATTLIST synonym
64 scope CDATA #IMPLIED
65 >
66 <!ELEMENT synonym_text (#PCDATA)>
67
68
69 <!ELEMENT intersection_of (type|to)+>
70 <!ELEMENT union_of (#PCDATA)>
71
72
<!-- TYPEDEF -->
73
74
<!ELEMENT typedef (id|name|is_a*|def?|domain?|range?|inverse_of?|is_transitive?|is_symmetric?|is_anti_symmetric?|is_reflexive?)+>
75
76 <!ELEMENT inverse_of (#PCDATA)>
77 <!ELEMENT domain (#PCDATA)>
78 <!ELEMENT range (#PCDATA)>
79
80 <!ELEMENT is_transitive (#PCDATA)>
81 <!ELEMENT is_symmetric (#PCDATA)>
82 <!ELEMENT is_anti_symmetric (#PCDATA)>
83 <!ELEMENT is_reflexive (#PCDATA)>
84
85
<!-- DBXREF -->
86
87
<!ELEMENT dbxref (acc|dbname|name)+>
88 <!ELEMENT acc (#PCDATA)>
89 <!ELEMENT dbname (#PCDATA)>
90