Schema Version: 1.3 Module: chado

Columns

Columns

Table Type Column Type Size Nulls Default Comments
feature_cvterm_dbxref Table feature_cvterm_dbxref_id bigserial 19 nextval('sequence.feature_cvterm_dbxref_feature_cvterm_dbxref_id_seq'::regclass)
featureloc Table phase int4 10 null

Phase of translation with respect to srcfeature_id. Values are 0, 1, 2. It may not be possible to manifest this column for some features such as exons, because the phase is dependant on the spliceform (the same exon can appear in multiple spliceforms). This column is mostly useful for predicted exons and CDSs.

feature Table type_id int8 19 null

A required reference to a table:cvterm giving the feature type. This will typically be a Sequence Ontology identifier. This column is thus used to subclass the feature table.

feature_dbxref Table dbxref_id int8 19 null
feature_cvtermprop Table feature_cvterm_id int8 19 null
feature_cvtermprop Table rank int4 10 0

Property-Value ordering. Any feature_cvterm can have multiple values for any particular property type - these are ordered in a list using rank, counting from zero. For properties that are single-valued rather than multi-valued, the default 0 value should be used.

feature_relationshipprop_pub Table pub_id int8 19 null
featureloc Table fmax int8 19 null

The rightmost/maximal boundary in the linear range represented by the featureloc. Sometimes (e.g. in bioperl) this is called -end- although this is confusing because it does not necessarily represent the 3-prime coordinate. Important: This is space-based (interbase) coordinates, counting from zero. No conversion is required to go from fmax to the rightmost coordinate in a base-oriented system that counts from 1 (e.g. GFF, Bioperl).

synonym Table type_id int8 19 null

Types would be symbol and fullname for now.

synonym Table name varchar 255 null

The synonym itself. Should be human-readable machine-searchable ascii text.

feature_relationship Table rank int4 10 0

The ordering of subject features with respect to the object feature may be important (for example, exon ordering on a transcript - not always derivable if you take trans spliced genes into consideration). Rank is used to order these; starts from zero.

featureloc Table locgroup int4 10 0

This is used to manifest redundant, derivable extra locations for a feature. The default locgroup=0 is used for the DIRECT location of a feature. Important: most Chado users may never use featurelocs WITH logroup > 0. Transitively derived locations are indicated with locgroup > 0. For example, the position of an exon on a BAC and in global chromosome coordinates. This column is used to differentiate these groupings of locations. The default locgroup 0 is used for the main or primary location, from which the others can be derived via coordinate transformations. Another example of redundant locations is storing ORF coordinates relative to both transcript and genome. Redundant locations open the possibility of the database getting into inconsistent states; this schema gives us the flexibility of both warehouse instantiations with redundant locations (easier for querying) and management instantiations with no redundant locations. An example of using both locgroup and rank: imagine a feature indicating a conserved region between the chromosomes of two different species. We may want to keep redundant locations on both contigs and chromosomes. We would thus have 4 locations for the single conserved region feature - two distinct locgroups (contig level and chromosome level) and two distinct ranks (for the two species).

feature Table name varchar 255 null

The optional human-readable common name for a feature, for display purposes.

feature_cvterm Table pub_id int8 19 null

Provenance for the annotation. Each annotation should have a single primary publication (which may be of the appropriate type for computational analyses) where more details can be found. Additional provenance dbxrefs can be attached using feature_cvterm_dbxref.

feature Table md5checksum bpchar 32 null

The 32-character checksum of the sequence, calculated using the MD5 algorithm. This is practically guaranteed to be unique for any feature. This column thus acts as a unique identifier on the mathematical sequence.

featureloc Table srcfeature_id int8 19 null

The source feature which this location is relative to. Every location is relative to another feature (however, this column is nullable, because the srcfeature may not be known). All locations are -proper- that is, nothing should be located relative to itself. No cycles are allowed in the featureloc graph.

feature_pubprop Table value text 2147483647 null
feature_relationshipprop Table feature_relationshipprop_id bigserial 19 nextval('sequence.feature_relationshipprop_feature_relationshipprop_id_seq'::regclass)
feature_cvterm Table is_not bool 1 false

If this is set to true, then this annotation is interpreted as a NEGATIVE annotation - i.e. the feature does NOT have the specified function, process, component, part, etc. See GO docs for more details.

featureprop_pub Table featureprop_id int8 19 null
feature_cvterm_dbxref Table dbxref_id int8 19 null
featureloc Table strand int2 5 null

The orientation/directionality of the location. Should be 0, -1 or +1.

feature_synonym Table pub_id int8 19 null

The pub_id link is for relating the usage of a given synonym to the publication in which it was used.

feature_relationshipprop Table value text 2147483647 null

The value of the property, represented as text. Numeric values are converted to their text representation. This is less efficient than using native database types, but is easier to query.

feature Table seqlen int8 19 null

The length of the residue feature. See column:residues. This column is partially redundant with the residues column, and also with featureloc. This column is required because the location may be unknown and the residue sequence may not be manifested, yet it may be desirable to store and query the length of the feature. The seqlen should always be manifested where the length of the sequence is known.

feature_synonym Table is_current bool 1 false

The is_current boolean indicates whether the linked synonym is the current -official- symbol for the linked feature.

feature Table feature_id bigserial 19 nextval('sequence.feature_feature_id_seq'::regclass)
feature_synonym Table synonym_id int8 19 null
feature_synonym Table is_internal bool 1 false

Typically a synonym exists so that somebody querying the db with an obsolete name can find the object theyre looking for (under its current name. If the synonym has been used publicly and deliberately (e.g. in a paper), it may also be listed in reports as a synonym. If the synonym was not used deliberately (e.g. there was a typo which went public), then the is_internal boolean may be set to -true- so that it is known that the synonym is -internal- and should be queryable but should not be listed in reports as a valid synonym.

featureloc_pub Table featureloc_id int8 19 null
feature_cvterm_pub Table pub_id int8 19 null
featureloc Table rank int4 10 0

Used when a feature has >1 location, otherwise the default rank 0 is used. Some features (e.g. blast hits and HSPs) have two locations - one on the query and one on the subject. Rank is used to differentiate these. Rank=0 is always used for the query, Rank=1 for the subject. For multiple alignments, assignment of rank is arbitrary. Rank is also used for sequence_variant features, such as SNPs. Rank=0 indicates the wildtype (or baseline) feature, Rank=1 indicates the mutant (or compared) feature.

feature_pubprop Table feature_pub_id int8 19 null
featureprop Table rank int4 10 0

Property-Value ordering. Any feature can have multiple values for any particular property type - these are ordered in a list using rank, counting from zero. For properties that are single-valued rather than multi-valued, the default 0 value should be used

feature_pub Table feature_id int8 19 null
feature_synonym Table feature_synonym_id bigserial 19 nextval('sequence.feature_synonym_feature_synonym_id_seq'::regclass)
feature_relationship Table feature_relationship_id bigserial 19 nextval('sequence.feature_relationship_feature_relationship_id_seq'::regclass)
feature_contact Table feature_contact_id bigserial 19 nextval('sequence.feature_contact_feature_contact_id_seq'::regclass)
feature_relationshipprop_pub Table feature_relationshipprop_id int8 19 null
featureprop_pub Table pub_id int8 19 null
feature_contact Table contact_id int8 19 null
feature_relationshipprop Table feature_relationship_id int8 19 null
featureprop Table featureprop_id bigserial 19 nextval('sequence.featureprop_featureprop_id_seq'::regclass)
feature Table uniquename text 2147483647 null

The unique name for a feature; may not be necessarily be particularly human-readable, although this is preferred. This name must be unique for this type of feature within this organism.

feature_relationshipprop_pub Table feature_relationshipprop_pub_id bigserial 19 nextval('sequence.feature_relationshipprop_pub_feature_relationshipprop_pub_i_seq'::regclass)
feature_dbxref Table is_current bool 1 true

True if this secondary dbxref is the most up to date accession in the corresponding db. Retired accessions should set this field to false

feature_dbxref Table feature_dbxref_id bigserial 19 nextval('sequence.feature_dbxref_feature_dbxref_id_seq'::regclass)
featureprop Table feature_id int8 19 null
feature Table timeaccessioned timestamp 29 now()

For handling object accession or modification timestamps (as opposed to database auditing data, handled elsewhere). The expectation is that these fields would be available to software interacting with chado.

feature_pubprop Table feature_pubprop_id bigserial 19 nextval('sequence.feature_pubprop_feature_pubprop_id_seq'::regclass)
feature_relationship Table value text 2147483647 null

Additional notes or comments.

feature_cvterm_pub Table feature_cvterm_pub_id bigserial 19 nextval('sequence.feature_cvterm_pub_feature_cvterm_pub_id_seq'::regclass)
feature_relationshipprop Table type_id int8 19 null

The name of the property/slot is a cvterm. The meaning of the property is defined in that cvterm. Currently there is no standard ontology for feature_relationship property types.

featureprop_pub Table featureprop_pub_id bigserial 19 nextval('sequence.featureprop_pub_featureprop_pub_id_seq'::regclass)
feature_cvterm_dbxref Table feature_cvterm_id int8 19 null
featureloc Table is_fmax_partial bool 1 false

This is typically false, but may be true if the value for column:fmax is inaccurate or the rightmost part of the range is unknown/unbounded.

feature_cvterm_pub Table feature_cvterm_id int8 19 null
feature_contact Table feature_id int8 19 null
synonym Table synonym_id bigserial 19 nextval('sequence.synonym_synonym_id_seq'::regclass)
feature Table timelastmodified timestamp 29 now()

For handling object accession or modification timestamps (as opposed to database auditing data, handled elsewhere). The expectation is that these fields would be available to software interacting with chado.

feature_cvterm Table feature_cvterm_id bigserial 19 nextval('sequence.feature_cvterm_feature_cvterm_id_seq'::regclass)
feature_cvterm Table feature_id int8 19 null
feature Table is_analysis bool 1 false

Boolean indicating whether this feature is annotated or the result of an automated analysis. Analysis results also use the companalysis module. Note that the dividing line between analysis and annotation may be fuzzy, this should be determined on a per-project basis in a consistent manner. One requirement is that there should only be one non-analysis version of each wild-type gene feature in a genome, whereas the same gene feature can be predicted multiple times in different analyses.

feature_pub Table feature_pub_id bigserial 19 nextval('sequence.feature_pub_feature_pub_id_seq'::regclass)
feature_pubprop Table type_id int8 19 null
featureloc Table feature_id int8 19 null

The feature that is being located. Any feature can have zero or more featurelocs.

feature_cvterm Table cvterm_id int8 19 null
feature_cvtermprop Table type_id int8 19 null

The name of the property/slot is a cvterm. The meaning of the property is defined in that cvterm. cvterms may come from the OBO evidence code cv.

featureprop Table type_id int8 19 null

The name of the property/slot is a cvterm. The meaning of the property is defined in that cvterm. Certain property types will only apply to certain feature types (e.g. the anticodon property will only apply to tRNA features) ; the types here come from the sequence feature property ontology.

featureloc Table fmin int8 19 null

The leftmost/minimal boundary in the linear range represented by the featureloc. Sometimes (e.g. in Bioperl) this is called -start- although this is confusing because it does not necessarily represent the 5-prime coordinate. Important: This is space-based (interbase) coordinates, counting from zero. To convert this to the leftmost position in a base-oriented system (eg GFF, Bioperl), add 1 to fmin.

featureloc Table residue_info text 2147483647 null

Alternative residues, when these differ from feature.residues. For instance, a SNP feature located on a wild and mutant protein would have different alternative residues. for alignment/similarity features, the alternative residues is used to represent the alignment string (CIGAR format). Note on variation features; even if we do not want to instantiate a mutant chromosome/contig feature, we can still represent a SNP etc with 2 locations, one (rank 0) on the genome, the other (rank 1) would have most fields null, except for alternative residues.

feature_relationship_pub Table pub_id int8 19 null
synonym Table synonym_sgml varchar 255 null

The fully specified synonym, with any non-ascii characters encoded in SGML.

featureloc Table featureloc_id bigserial 19 nextval('sequence.featureloc_featureloc_id_seq'::regclass)
feature_cvtermprop Table value text 2147483647 null

The value of the property, represented as text. Numeric values are converted to their text representation. This is less efficient than using native database types, but is easier to query.

feature_dbxref Table feature_id int8 19 null
feature_pub Table pub_id int8 19 null
feature_pubprop Table rank int4 10 0
feature_relationship Table object_id int8 19 null

The object of the subj-predicate-obj sentence. This is typically the container feature.

feature_relationship Table type_id int8 19 null

Relationship type between subject and object. This is a cvterm, typically from the OBO relationship ontology, although other relationship types are allowed. The most common relationship type is OBO_REL:part_of. Valid relationship types are constrained by the Sequence Ontology.

feature_cvterm Table rank int4 10 0
feature_relationshipprop Table rank int4 10 0

Property-Value ordering. Any feature_relationship can have multiple values for any particular property type - these are ordered in a list using rank, counting from zero. For properties that are single-valued rather than multi-valued, the default 0 value should be used.

feature_relationship_pub Table feature_relationship_id int8 19 null
feature_synonym Table feature_id int8 19 null
feature_relationship_pub Table feature_relationship_pub_id bigserial 19 nextval('sequence.feature_relationship_pub_feature_relationship_pub_id_seq'::regclass)
feature_relationship Table subject_id int8 19 null

The subject of the subj-predicate-obj sentence. This is typically the subfeature.

feature Table residues text 2147483647 null

A sequence of alphabetic characters representing biological residues (nucleic acids, amino acids). This column does not need to be manifested for all features; it is optional for features such as exons where the residues can be derived from the featureloc. It is recommended that the value for this column be manifested for features which may may non-contiguous sublocations (e.g. transcripts), since derivation at query time is non-trivial. For expressed sequence, the DNA sequence should be used rather than the RNA sequence. The default storage method for the residues column is EXTERNAL, which will store it uncompressed to make substring operations faster.

featureprop Table value text 2147483647 null

The value of the property, represented as text. Numeric values are converted to their text representation. This is less efficient than using native database types, but is easier to query.

featureloc_pub Table featureloc_pub_id bigserial 19 nextval('sequence.featureloc_pub_featureloc_pub_id_seq'::regclass)
feature Table is_obsolete bool 1 false

Boolean indicating whether this feature has been obsoleted. Some chado instances may choose to simply remove the feature altogether, others may choose to keep an obsolete row in the table.

feature Table organism_id int8 19 null

The organism to which this feature belongs. This column is mandatory.

featureloc Table is_fmin_partial bool 1 false

This is typically false, but may be true if the value for column:fmin is inaccurate or the leftmost part of the range is unknown/unbounded.

feature_cvtermprop Table feature_cvtermprop_id bigserial 19 nextval('sequence.feature_cvtermprop_feature_cvtermprop_id_seq'::regclass)
feature Table dbxref_id int8 19 null

An optional primary public stable identifier for this feature. Secondary identifiers and external dbxrefs go in the table feature_dbxref.

featureloc_pub Table pub_id int8 19 null