SchemaDoc: Feature

Columns

Column	MySQL Type	Oracle Type	Nullable	Description
WID	BIGINT	NUMBER	No	Warehouse identifier of this feature.
Description	VARCHAR	VARCHAR2(1300)	Yes	Textual description of this feature.
Type	VARCHAR	VARCHAR2(50)	Yes	Type of feature. These type values come from the source dataset and are not necessarily enumerated. Example: "Promoter CYZ".
Class	VARCHAR	VARCHAR2(50)	Yes	Class of feature. Assigns our typology for types of features, or qualifiers associated with features. These are our own enumerated type values, to allow us to classify features without losing the original (author-provided) type values stored in Type. Example: Class=promoter. *Enumerated Values:* binding site - Identifies the presence of a DNA binding site promoter - Identifies the presence of a promoter terminator - Identifies the presence of a terminator pseudogene - Identifies a pseudogene, whether non-transcribed or processed ORF - Identifies a truly unknown open reading frame according to warehouse definition (no strong evidence that a product is produced) partial - Qualifier: States that feature is not complete unknown product - Identifies an unspecified product is produced from this genomic location, as stated in dataset notable - Qualifier: Characterizes the feature value as notable (same as 'Exceptional' in GB dataset) by similarity - Qualifier: states that feature was derived by similarity analysis potential - Qualifier: states that feature may be incorrect probably - Qualifier: states that feature is probably correct
SequenceType	CHAR(1)	CHAR(1)	No	Enumeration (see also SequenceWID) that indicates whether the sequence is protein or nucleic and how the sequence (if available) is represented: If 'P', feature resides on a protein. If 'S' or 'N', feature resides on a nucleic acid. *Enumerated Values:* P - Feature resides on a protein. Implies SequenceWID (if nonNULL) references a Protein S - Feature resides on a nucleic acid. Implies SequenceWID is nonNULL and references a Subsequence N - Feature resides on a nucleic acid. Implies SequenceWID (if nonNULL) references a NucleicAcid
SequenceWID	BIGINT	NUMBER	Yes	References the Protein or Subsequence containing the sequence on which we are defining a feature: SequenceType of 'S' implies SequenceWID is nonNULL and references a Subsequence- sequence = Subsequence.Sequence (i.e., it is stored explicitly), SequenceType of 'N' implies SequenceWID (if nonNULL) references a NucleicAcid- sequence is the substring Subsequence.Sequence[StartPosition : EndPosition] where Subsequence is the full Subsequence of the nucleic acid. SequenceType of 'P' implies SequenceWID (if nonNULL) references a Protein- sequence is the substring Protein.AASequence[StartPosition : EndPosition].
Variant	LONGTEXT	CLOB	Yes	Amino-acid sequence for this protein, if available
RegionOrPoint	VARCHAR	VARCHAR2(10)	Yes	Specifies whether this feature is specified with starting and ending coordinates or with a single coordinate. *Enumerated Values:* region - Feature is specified by a start point and an end point on the sequence point - Feature is specified by a single point on the sequence
PointType	VARCHAR	VARCHAR2(10)	Yes	Only defined if RegionOrPoint='point'. Specifies where the feature is relative to its location as encoded in StartPosition and EndPosition: *Enumerated Values:* center - Feature is centered at location. left - Feature extends to the left (decreasing position) of location. right - Feature extends to the right (increasing position) of location.
StartPosition	INT	NUMBER	Yes	Start position of the feature within the NucleicAcid or Protein sequence. If Feature.RegionOrPoint is 'point', StartPosition and EndPosition will either be equal (location is exactly at a nucleotide or amino acid) or will differ by 1 (location is centered between two adjacent nucleotides or amino acids).
EndPosition	INT	NUMBER	Yes	End position of the feature within the NucleicAcid or Protein sequence. If Feature.RegionOrPoint is 'point', StartPosition and EndPosition will either be equal (location is exactly at a nucleotide or amino acid) or will differ by 1 (location is between two adjacent nucleotides or amino acids).
StartPositionApproximate	VARCHAR	VARCHAR2(10)	Yes	Indicates that the Start position of the coding region is an approximate value It could be 'gt' for greater than, 'lt' for less than and 'ne' to indicate that it is not equal. This is a controlled vocabulary. *Enumerated Values:* gt - The start position of the feature is greater than the actual position specified. lt - The start position of the feature is less than the actual position specified. ne - The start position of the feature is less than or greater than the actual position. All we know is that its not the exact position.
EndPositionApproximate	VARCHAR	VARCHAR2(10)	Yes	Indicates that the End position of the coding region is an approximate value. It could be 'gt' for greater than, 'lt' for less than and 'ne' to indicate that it is not equal. This is a controlled vocabulary. *Enumerated Values:* gt - The end position of the feature is greater than the actual position specified. lt - The end position of the feature is less than the actual position specified. ne - The end position of the feature is less than or greater than the actual position. All we know is that its not the exact position.
ExperimentalSupport	CHAR(1)	CHAR(1)	Yes	'T' if the feature is supported by experimental evidence, else 'F'
ComputationalSupport	CHAR(1)	CHAR(1)	Yes	'T' if the feature is supported by computational evidence, else 'F'
DataSetWID	BIGINT	NUMBER	No	Reference to the data set from which the entity came from

Column

MySQL Type

Oracle Type

Nullable

Description

WID

BIGINT

NUMBER

Warehouse identifier of this feature.

Description

VARCHAR

VARCHAR2(1300)

Yes

Textual description of this feature.

Type

VARCHAR

VARCHAR2(50)

Yes

Type of feature. These type values come from the source dataset and
are not necessarily enumerated. Example: "Promoter CYZ".

Class

VARCHAR

VARCHAR2(50)

Yes

Class of feature. Assigns our typology for types of features, or qualifiers
associated with features. These are our own enumerated type values, to allow us
to classify features without losing the original (author-provided) type values
stored in Type. Example: Class=promoter.

Enumerated Values:
binding site - Identifies the presence of a DNA binding site
promoter - Identifies the presence of a promoter
terminator - Identifies the presence of a terminator
pseudogene - Identifies a pseudogene, whether non-transcribed or processed
ORF - Identifies a truly unknown open reading frame according to warehouse definition (no strong evidence that a product is produced)
partial - Qualifier: States that feature is not complete
unknown product - Identifies an unspecified product is produced from this genomic location, as stated in dataset
notable - Qualifier: Characterizes the feature value as notable (same as 'Exceptional' in GB dataset)
by similarity - Qualifier: states that feature was derived by similarity analysis
potential - Qualifier: states that feature may be incorrect
probably - Qualifier: states that feature is probably correct

SequenceType

CHAR(1)

Enumeration (see also SequenceWID) that indicates whether the sequence
is protein or nucleic and how the sequence (if available) is represented:
If 'P', feature resides on a protein.
If 'S' or 'N', feature resides on a nucleic acid.

Enumerated Values:
P - Feature resides on a protein. Implies SequenceWID (if nonNULL) references a Protein
S - Feature resides on a nucleic acid. Implies SequenceWID is nonNULL and references a Subsequence
N - Feature resides on a nucleic acid. Implies SequenceWID (if nonNULL) references a NucleicAcid

SequenceWID

BIGINT

NUMBER

Yes

References the Protein or Subsequence containing the sequence on which we
are defining a feature:
SequenceType of 'S' implies SequenceWID is nonNULL and references a Subsequence-
sequence = Subsequence.Sequence (i.e., it is stored explicitly),
SequenceType of 'N' implies SequenceWID (if nonNULL) references a NucleicAcid-
sequence is the substring Subsequence.Sequence[StartPosition : EndPosition]
where Subsequence is the full Subsequence of the nucleic acid.
SequenceType of 'P' implies SequenceWID (if nonNULL) references a Protein-
sequence is the substring Protein.AASequence[StartPosition : EndPosition].

Variant

LONGTEXT

CLOB

Yes

Amino-acid sequence for this protein, if available

RegionOrPoint

VARCHAR

VARCHAR2(10)

Yes

Specifies whether this feature is specified with starting and ending coordinates
or with a single coordinate.

Enumerated Values:
region - Feature is specified by a start point and an end point on the sequence
point - Feature is specified by a single point on the sequence

PointType

VARCHAR

VARCHAR2(10)

Yes

Only defined if RegionOrPoint='point'. Specifies where the feature is relative
to its location as encoded in StartPosition and EndPosition:

Enumerated Values:
center - Feature is centered at location.
left - Feature extends to the left (decreasing position) of location.
right - Feature extends to the right (increasing position) of location.

StartPosition

INT

NUMBER

Yes

Start position of the feature within the NucleicAcid or Protein sequence.
If Feature.RegionOrPoint is 'point', StartPosition and EndPosition will
either be equal (location is exactly at a nucleotide or amino acid) or will differ by 1
(location is centered between two adjacent nucleotides or amino acids).

EndPosition

INT

NUMBER

Yes

End position of the feature within the NucleicAcid or Protein sequence.
If Feature.RegionOrPoint is 'point', StartPosition and EndPosition will
either be equal (location is exactly at a nucleotide or amino acid) or will differ by 1
(location is between two adjacent nucleotides or amino acids).

StartPositionApproximate

VARCHAR

VARCHAR2(10)

Yes

Indicates that the Start position of the coding region is an approximate value
It could be 'gt' for greater than, 'lt' for less than and 'ne' to indicate that it is not
equal. This is a controlled vocabulary.

Enumerated Values:
gt - The start position of the feature is greater than the actual position specified.
lt - The start position of the feature is less than the actual position specified.
ne - The start position of the feature is less than or greater than the actual position. All we know is that its not the exact position.

EndPositionApproximate

VARCHAR

VARCHAR2(10)

Yes

Indicates that the End position of the coding region is an approximate value.
It could be 'gt' for greater than, 'lt' for less than and 'ne' to indicate that it is not
equal. This is a controlled vocabulary.

Enumerated Values:
gt - The end position of the feature is greater than the actual position specified.
lt - The end position of the feature is less than the actual position specified.
ne - The end position of the feature is less than or greater than the actual position. All we know is that its not the exact position.

ExperimentalSupport

CHAR(1)

Yes

'T' if the feature is supported by experimental evidence, else 'F'

ComputationalSupport

CHAR(1)

Yes

'T' if the feature is supported by computational evidence, else 'F'

DataSetWID

BIGINT

NUMBER

Reference to the data set from which the entity came from

Column	References
SequenceWID	Subsequence : WID
SequenceWID	Protein : WID

Column

References

SequenceWID

Subsequence : WID

SequenceWID

Protein : WID

Referenced By

Table	Column
SeqFeatureLocation	SeqFeature_Regions
Entry	OtherWID
TranscriptionUnitComponent	OtherWID
Support	OtherWID
RelatedTerm	OtherWID
CitationWIDOtherWID	OtherWID
CommentTable	OtherWID
CrossReference	OtherWID
CrossReference	CrossWID
Description	OtherWID
DBID	OtherWID
SynonymTable	OtherWID
ToolAdvice	OtherWID

Table

Column

SeqFeatureLocation

SeqFeature_Regions

Entry

OtherWID

TranscriptionUnitComponent

OtherWID

OtherWID

OtherWID

OtherWID

OtherWID

OtherWID

CrossWID

OtherWID

OtherWID

OtherWID

OtherWID

Indexes

Name	Columns
FEATURE_Description
FEATURE_TYPE	Type
FEATURE_Class	Class
FEATURE_SequenceWID	SequenceWID
FEATURE_START_ENDPOS	STARTPOSITION,ENDPOSITION
FEATURE_ENDPOSITION	ENDPOSITION
FEATURE_DATASETWID	DATASETWID

Name

Columns

FEATURE_Description

FEATURE_TYPE

Type

FEATURE_Class

Class

FEATURE_SequenceWID

SequenceWID

FEATURE_START_ENDPOS

STARTPOSITION,ENDPOSITION

FEATURE_ENDPOSITION

ENDPOSITION

FEATURE_DATASETWID

DATASETWID

Feature

Columns

Denormalized References

Referenced By

Other Constraints

Indexes