Elements, Observations, and Features
We adopt the convention from statistical learning of referring to Observation
s and Feature
s of data. Both of these data structures derive from the BaseElement
class, which captures their common structure and behavior. Specialization for each can be overridden in the child classes.
In the context of a biological experimental, Observation
s are synonymous with samples. Further, each Observation
can have Feature
s associated with it (e.g. gene expressions for 30,000 genes). One can think of Observation
s and Feature
s as comprising the columns and rows of a two-dimensional matrix. Note that in our convention, due to the typical format of expression matrices, we take each column to represent an Observation
and each row to represent a Feature
.
We use Observation
s and Feature
s to hold metadata (as key-value pairs) about data that we manipulating in WebMEV. For instance, given a typical gene expression matrix we have information about only the names of the Observation
s/samples and Feature
s/genes. We can then specify attributes to annotate the Observation
s and Feature
s, allowing users to define experimental groups, or specify other information useful for visualization or filtering.
These data structures have similar (if not exactly the same) behavior but we separate them for future compatability in case specialization of each class is needed.
data_structures.element.BaseElement
(val, **kwargs)A BaseElement
is a base class from which we can derive both Observation
and Features
. For the purposes of clarity and potential customization,
we keep those entities separate.
As a type of attribute, an Element
(using an Observation below)
would look like:
{
"id": <string identifier>,
"attributes": {
"keyA": <Attribute>,
"keyB": <Attribute>
}
}
We require that all Element
instances be created with an identifier.
Equality (e.g. in set operations) is checked using this identifier member
The nested attributes are objects that dictate a simple attribute For instance:
{
"id": <string identifier>,
"attributes": {
"stage": {
"attribute_type": "String",
"value": "IV"
},
"age": {
"attribute_type": "PositiveInteger",
"value": 5
}
}
}
The nested dict attributes
CAN be empty.
In situations like annotation tables where
certain rows may not have values (but others do),
we want to be able to permit null attributes
if the constructor is explicitly passed the
permit_null_attributes
kwarg
data_structures.observation.Observation
(val, **kwargs)An Observation
is the generalization of a "sample" in the typical context
of biological studies. One may think of samples and observations as
interchangeable concepts. We call it an observation so that we are not
limited by this convention, however.
Observation
instances act as metadata and can be used to filter and subset
the data to which it is associated/attached.
An Observation
is structured as:
{
"id": <string identifier>,
"attributes": {
"keyA": <Attribute>,
"keyB": <Attribute>
}
}
data_structures.feature.Feature
(val, **kwargs)A Feature
can also be referred to as a covariate or variable.
These are measurements one can make about an Observation
. For example,
in the genomics context, a sample can have 30,000+ genes which we call
"features" here. In the statistical learning context, these are feature vectors.
Feature
instances act as metadata and can be used to filter and subset
the data to which it is associated/attached. For example, we can imagine
filtering by genes/features which have a particular value, such as those genes
where the attribute "oncogene" is set to "true"
A Feature
is structured as:
{
"id": <string identifier>,
"attributes": {
"keyA": <Attribute>,
"keyB": <Attribute>
}
}