Table Index

Gene


Defines a notion of gene that is limited to procaryotic aspects of
genes. Later versions of the warehouse will expand this definition to include
eukaryotic aspects of genes. Separate tables define associations between a gene and
(1) its biological source(s),
(2) its protein product(s), if any, and
(3) its RNA product(s), if any.

Columns

Column MySQL Type Oracle Type Nullable Description
WID BIGINT NUMBER No Warehouse identifier this gene.
Name VARCHAR VARCHAR2(255) Yes Name of the gene.
NucleicAcidWID BIGINT NUMBER Yes Reference to the NucleicAcid molecule (replicon) this gene resides upon.
SubsequenceWID BIGINT NUMBER Yes Reference to the Subsequence containing the nucleotide sequence of this Gene.
Type VARCHAR VARCHAR2(100) Yes Describes the type of molecule which is known to be *ultimately* produced by this gene
enumerated values (polypeptide, pre-mRNA, rRNA, tRNA, etc)

Enumerated Values:
unknown - used when transcriptional status of gene is unknown
pre-RNA - used when there is no evidence that a mature RNA is ultimately produced by the gene
mRNA - used when there is evidence that a mature mRNA is ultimately produced by the gene
rRNA - used when there is evidence that a mature rRNA is ultimately produced by the gene
tRNA - used when there is evidence that a mature tRNA is ultimately produced by the gene
snRNA - used when there is evidence that an snRNA is ultimately produced by the gene
scRNA - ????
polypeptide - used when there is evidence that the ultimate product of this gene is proteinaceous
snoRNA - ???
other - catchall ?
GenomeID VARCHAR VARCHAR2(35) Yes Unique ID assigned to this gene, such as by a genome project
CodingRegionStart INT NUMBER Yes Base position of start of coding region. Start is always less than End,
except for genes that wrap around the origin of a circular chromosome.
CodingRegionEnd INT NUMBER Yes Base position of end of coding region.
Indexes the stop codon that terminates the gene.
CodingRegionStartApproximate VARCHAR VARCHAR2(10) Yes Indicates that the Start position of the coding region is an approximate value.
It could be 'gt' for greater than, 'lt' for less than and 'ne' to indicate that it is not
equal. This is a controlled vocabulary.
CodingRegionEndApproximate VARCHAR VARCHAR2(10) Yes Indicates that the End position of the coding region is an approximate value.
It could be 'gt' for greater than, 'lt' for less than and 'ne' to indicate that it is not
equal. This is a controlled vocabulary.
Direction VARCHAR VARCHAR2(25) Yes Direction of transcription as defined in the enumeration table

Enumerated Values:
unknown - unknown the strand being transcribed is unknown
forward - the plus strand is being transcribed
reverse - the minus strand is being transcribed
forward_and_reverse - both strands are being transcribed
undefined_value - UNDEFINED; the NCBI documentation does not define this value
Interrupted CHAR(1) CHAR(1) Yes 'T' if the gene is interrupted, else 'F'.
DataSetWID BIGINT NUMBER No Reference to the data set from which the entity came from

Referenced By

Table Column
BioSourceWIDGeneWID GeneWID
GeneWIDNucleicAcidWID GeneWID
GeneWIDProteinWID GeneWID
Entry OtherWID
TranscriptionUnitComponent OtherWID
Support OtherWID
RelatedTerm OtherWID
CitationWIDOtherWID OtherWID
CommentTable OtherWID
CrossReference OtherWID
CrossReference CrossWID
Description OtherWID
DBID OtherWID
SynonymTable OtherWID
ToolAdvice OtherWID

Other Constraints

None.

Indexes

Name Columns
GENE_DATASETWID DATASETWID
GENE_Name Name
GENE_Type Type
GENE_GenomeID GenomeID
GENE_START_END_POSITION CODINGREGIONSTART, CODINGREGIONEND
GENE_ENDPOSITION CODINGREGIONEND
GENE_NAWID NucleicAcidWID
GENE_SubsequenceWID SubsequenceWID