Introduction
(C) 2006 SRI International. All Rights Reserved. See BioWarehouse Overview for license details.
The textual representation of the database consists of several ASCII files which are used in the NCBI Taxonomy loader. These files are loaded in the following order:
- division.dmp
- gencode.dmp
- names.dmp
- nodes.dmp
The NCBI installation instructions contain details for running the loader, including options and a description of its output.
Dataset
table as follows:
Column | Value assigned by NCBI Taxonomy loader | |
---|---|---|
WID |
The next available WID in the warehouse. Uniquely specifies this dataset in the warehouse. | |
Name |
'NCBI Taxonomy' | |
LoadDate |
The time/date the loader was run. | |
Version |
This is by convention the download date of the data, as NCBI Taxonomy changes continually and is not versioned. | |
ReleaseDate |
This is by convention the download date of the data, as NCBI Taxonomy changes continually and is not versioned. | |
ChangeDate |
The date and time the loader completed, NULL if the loader did not complete successfully. | |
LoadedBy |
The value of the system environment variable USER for the account running the loader. | |
Application |
'NCBI Taxonomy Loader' | |
ApplicationVersion |
4.6 | |
HomeURL |
http://ncbi.nlm.nih.gov/entrez/query.fcgi?db=Taxonomy |
Division
Genetic Codes
Taxon
Entry Table
Division
table for each entry in division.dmp.
Each row has a unique WID for a Division and the dataset wid which is
unique for the NCBI database.
NCBI database Attribute | Warehouse Semantics | |
---|---|---|
Division Id | DBID.XID DBID.OtherWID is the WID of this Division
object. |
|
Division Cde | Division.Code The value specifies the Division code, which is a three letter abbreviation for the Division . For example: BCT for
Bacteria. |
|
Division Name | Division.Name The name of this Division . |
|
Comments | CommentTable.Comm CommentTable.OtherWID is the WID of this
Division object. |
GeneticCode
table for each entry in
gencode.dmp.
Each row has a unique WID for a genetic code and the dataset wid which
is unique for the NCBI database.
NCBI Taxonomy Database Attributes | Warehouse Semantics | |
---|---|---|
Genetic code id | GeneticCode.NCBIID and DBID.XID ; DBID.OtherWID is the WID
of this GeneticCode object. |
|
Abbreviation | Ignored | |
Name | GeneticCode.Name If there are multiple names in the name field (separated by semicolon) they are stored in the SynonymTable .SynonymTable.Syn
stores the synonym and SynonymTable.OtherWID is the WID
of this GeneticCode object. |
|
Cde | GeneticCode.TranslationTable |
|
Starts | GeneticCode.StartCodon |
Taxon
table and one name (either name_txt
or unique name) is stored in the Taxon
table. All other
names for a given tax id are added to the SynonymTable
.
The following priority rules are followed to decide what name should be
stored in the Taxon
table (arranged in order of
decreasing priority): Taxon
table with the preferred name. An
entry is created
in the Entry
table for this Taxon.
NCBI Taxonomy Database Attribute | Warehouse Semantics | |
---|---|---|
tax_id | DBID.XID DBID.OtherWID is the WID of this Taxon
object. |
|
name_txt | Taxon.Name or SynonymTable.Syn
depending on the priority that this name has compared to the other
names that are there in the table. If the name is enclosed in single quotes, the quotes are removed. If a name other than 'environmental sample' ends with a qualifying name in angle brackets (e.g., 'Bacteria <Bacteria>'), the text in the angle brackets and any preceding spaces are removed. |
|
unique name | Taxon.Name or SynonymTable.Syn
depending on the priority that this name has compared to the other
names that are there in the table. If the name is enclosed in single quotes, the quotes are removed. If a name other than 'environmental sample' ends with a qualifying name in angle brackets (e.g., 'Bacteria <Bacteria>'), the text in the angle brackets and any preceding spaces are removed. |
|
name class | This helps to determine the priority of this
name. Depending on this priority and what other name classes are
available for the same tax id, the name is either placed in the Taxon.Name
or SynonymTable.Syn . |
NCBI Taxonomy Database Attribute | Warehouse Semantics | |
---|---|---|
tax_id | Used to look up the corresponding wid for the Taxon
and update the Taxon entry that was added while parsing
the names.dmp table.If this is null then an error is raised. This is
based on the assumption that the names.dmp contains all tax ids and
hence a wid for any Taxon should have already been
created. |
|
parent tax_id | Taxon.Parent_WID . If this is null
then an error is raised. |
|
rank | Taxon.Rank . If this Rank
doesn't exist in the Enumeration table then an error is raised. |
|
embl code | Ignored | |
division id | Taxon.Division_WID .The
corresponding WID for the Division is found from the division id and
stored. If division id not found then an error is raised. |
|
inherited div flag | Taxon.Inherited_Div_Flag . 'T'
or 'F' based on whether the inherited div flag is 1 or 0. |
|
genetic code id | Taxon.Gencode_WID . The
corresponding WID for the Genetic Code is found from the genetic code
id and stored. If genetic code id not found then an error is raised. |
|
inherited GC flag | Taxon.Inherited_GC_Flag .'T' or
'F' based on whether the inherited GC flag is 1 or 0. |
|
mitochondrial genetic code id | Taxon.MC_Gencode_WID The
corresponding WID for the Genetic Code is found from the mitochondrial
genetic code id and stored. If genetic code id not found then an error
is raised. |
|
inherited MGC flag | Taxon.Inherited_MCGC_Flag .'T'
or 'F' based on whether the inherited MGC flag is 1 or 0. |
|
GenBank hidden flag | Ignored | |
hidden subtree root flag | Ignored | |
comments | CommentTable.Comm with CommentTable.OtherWID
equal to the WID of this object. |
Entry
table is created as follows:
Column | Value assigned by NCBI loader | |
---|---|---|
OtherWID |
The WID of the entry described by this row.
Entry may be in Division , GeneticCode or Taxon .
|
|
InsertDate |
The time/date the loader was run. | |
LoadError |
'T' if a parse error is detected, 'F' otherwise. | |
LineNumber |
The line number at which the error is noticed. | |
DatasetWID |
The value Dataset.WID assigned
to the dataset being loaded. |