SchemaDoc: Gene

Columns

Column	MySQL Type	Oracle Type	Nullable	Description
WID	BIGINT	NUMBER	No	Warehouse identifier this gene.
Name	VARCHAR	VARCHAR2(255)	Yes	Name of the gene.
NucleicAcidWID	BIGINT	NUMBER	Yes	Reference to the NucleicAcid molecule (replicon) this gene resides upon.
SubsequenceWID	BIGINT	NUMBER	Yes	Reference to the Subsequence containing the nucleotide sequence of this Gene.
Type	VARCHAR	VARCHAR2(100)	Yes	Describes the type of molecule which is known to be ultimately produced by this gene enumerated values (polypeptide, pre-mRNA, rRNA, tRNA, etc) *Enumerated Values:* unknown - used when transcriptional status of gene is unknown pre-RNA - used when there is no evidence that a mature RNA is ultimately produced by the gene mRNA - used when there is evidence that a mature mRNA is ultimately produced by the gene rRNA - used when there is evidence that a mature rRNA is ultimately produced by the gene tRNA - used when there is evidence that a mature tRNA is ultimately produced by the gene snRNA - used when there is evidence that an snRNA is ultimately produced by the gene scRNA - ???? polypeptide - used when there is evidence that the ultimate product of this gene is proteinaceous snoRNA - ??? other - catchall ?
GenomeID	VARCHAR	VARCHAR2(35)	Yes	Unique ID assigned to this gene, such as by a genome project
CodingRegionStart	INT	NUMBER	Yes	Base position of start of coding region. Start is always less than End, except for genes that wrap around the origin of a circular chromosome.
CodingRegionEnd	INT	NUMBER	Yes	Base position of end of coding region. Indexes the stop codon that terminates the gene.
CodingRegionStartApproximate	VARCHAR	VARCHAR2(10)	Yes	Indicates that the Start position of the coding region is an approximate value. It could be 'gt' for greater than, 'lt' for less than and 'ne' to indicate that it is not equal. This is a controlled vocabulary.
CodingRegionEndApproximate	VARCHAR	VARCHAR2(10)	Yes	Indicates that the End position of the coding region is an approximate value. It could be 'gt' for greater than, 'lt' for less than and 'ne' to indicate that it is not equal. This is a controlled vocabulary.
Direction	VARCHAR	VARCHAR2(25)	Yes	Direction of transcription as defined in the enumeration table *Enumerated Values:* unknown - unknown the strand being transcribed is unknown forward - the plus strand is being transcribed reverse - the minus strand is being transcribed forward_and_reverse - both strands are being transcribed undefined_value - UNDEFINED; the NCBI documentation does not define this value
Interrupted	CHAR(1)	CHAR(1)	Yes	'T' if the gene is interrupted, else 'F'.
DataSetWID	BIGINT	NUMBER	No	Reference to the data set from which the entity came from

Column

MySQL Type

Oracle Type

Nullable

Description

WID

BIGINT

NUMBER

Warehouse identifier this gene.

Name

VARCHAR

VARCHAR2(255)

Yes

Name of the gene.

NucleicAcidWID

BIGINT

NUMBER

Yes

Reference to the NucleicAcid molecule (replicon) this gene resides upon.

SubsequenceWID

BIGINT

NUMBER

Yes

Reference to the Subsequence containing the nucleotide sequence of this Gene.

Type

VARCHAR

VARCHAR2(100)

Yes

Describes the type of molecule which is known to be *ultimately* produced by this gene
enumerated values (polypeptide, pre-mRNA, rRNA, tRNA, etc)

Enumerated Values:
unknown - used when transcriptional status of gene is unknown
pre-RNA - used when there is no evidence that a mature RNA is ultimately produced by the gene
mRNA - used when there is evidence that a mature mRNA is ultimately produced by the gene
rRNA - used when there is evidence that a mature rRNA is ultimately produced by the gene
tRNA - used when there is evidence that a mature tRNA is ultimately produced by the gene
snRNA - used when there is evidence that an snRNA is ultimately produced by the gene
scRNA - ????
polypeptide - used when there is evidence that the ultimate product of this gene is proteinaceous
snoRNA - ???
other - catchall ?

GenomeID

VARCHAR

VARCHAR2(35)

Yes

Unique ID assigned to this gene, such as by a genome project

CodingRegionStart

INT

NUMBER

Yes

Base position of start of coding region. Start is always less than End,
except for genes that wrap around the origin of a circular chromosome.

CodingRegionEnd

INT

NUMBER

Yes

Base position of end of coding region.
Indexes the stop codon that terminates the gene.

CodingRegionStartApproximate

VARCHAR

VARCHAR2(10)

Yes

Indicates that the Start position of the coding region is an approximate value.
It could be 'gt' for greater than, 'lt' for less than and 'ne' to indicate that it is not
equal. This is a controlled vocabulary.

CodingRegionEndApproximate

VARCHAR

VARCHAR2(10)

Yes

Indicates that the End position of the coding region is an approximate value.
It could be 'gt' for greater than, 'lt' for less than and 'ne' to indicate that it is not
equal. This is a controlled vocabulary.

Direction

VARCHAR

VARCHAR2(25)

Yes

Direction of transcription as defined in the enumeration table

Enumerated Values:
unknown - unknown the strand being transcribed is unknown
forward - the plus strand is being transcribed
reverse - the minus strand is being transcribed
forward_and_reverse - both strands are being transcribed
undefined_value - UNDEFINED; the NCBI documentation does not define this value

Interrupted

CHAR(1)

Yes

'T' if the gene is interrupted, else 'F'.

DataSetWID

BIGINT

NUMBER

Reference to the data set from which the entity came from

Referenced By

Table	Column
BioSourceWIDGeneWID	GeneWID
GeneWIDNucleicAcidWID	GeneWID
GeneWIDProteinWID	GeneWID
Entry	OtherWID
TranscriptionUnitComponent	OtherWID
Support	OtherWID
RelatedTerm	OtherWID
CitationWIDOtherWID	OtherWID
CommentTable	OtherWID
CrossReference	OtherWID
CrossReference	CrossWID
Description	OtherWID
DBID	OtherWID
SynonymTable	OtherWID
ToolAdvice	OtherWID

Table

Column

BioSourceWIDGeneWID

GeneWID

GeneWIDNucleicAcidWID

GeneWID

GeneWIDProteinWID

GeneWID

Entry

OtherWID

TranscriptionUnitComponent

OtherWID

OtherWID

OtherWID

OtherWID

OtherWID

OtherWID

CrossWID

OtherWID

OtherWID

OtherWID

OtherWID

Indexes

Name	Columns
GENE_DATASETWID	DATASETWID
GENE_Name	Name
GENE_Type	Type
GENE_GenomeID	GenomeID
GENE_START_END_POSITION	CODINGREGIONSTART, CODINGREGIONEND
GENE_ENDPOSITION	CODINGREGIONEND
GENE_NAWID	NucleicAcidWID
GENE_SubsequenceWID	SubsequenceWID

Name

Columns

GENE_DATASETWID

DATASETWID

GENE_Name

Name

GENE_Type

Type

GENE_GenomeID

GenomeID

GENE_START_END_POSITION

CODINGREGIONSTART, CODINGREGIONEND

GENE_ENDPOSITION

CODINGREGIONEND

GENE_NAWID

NucleicAcidWID

GENE_SubsequenceWID

SubsequenceWID

Gene

Columns

Referenced By

Other Constraints

Indexes