API for MultiDataset containers¶
MultiDataset comes in the following different variations to accommodate different types of targets
MultiDatasetClassify
MultiDatasetRegress
API for the above classes is shown below:
-
class
pyradigm.multiple.
MultiDatasetClassify
(dataset_spec=None, name='MultiDatasetClassify', subgroup=None)[source]¶ Bases:
pyradigm.multiple.BaseMultiDataset
Container class to manage multimodal classification datasets.
- Attributes
common_attr
Attributes common to all subjects/datasets, such as covariates, in this
modality_ids
List of identifiers for all modalities/datasets, sorted for reproducibility.
samplet_ids
List of samplet IDs in the multi-dataset
target_set
Set of targets/classes in this multi-dataset
target_sizes
Sizes of targets in this classification dataset.
Methods
append
(self, dataset, identifier)Adds a dataset, if compatible with the existing ones.
append_subgroup
(self, dataset, identifier, …)Custom add method
get_attr
(self, ds_id, attr_name[, …])Method to retrieve modality-/dataset-specific attributes
get_common_attr
(self, names, subset[, …])Helper to retrieve the requested attributes common to all datasets.
get_subsets
(self, subset_list)Returns the requested subsets of data while iterating over modalities
holdout
(self[, train_perc, num_rep, …])Builds a holdout generator for train and test sets for cross-validation.
set_attr
(self, ds_id, attr_name, attr_value)Method to set modality-/dataset-specific attributes
-
append
(self, dataset, identifier)¶ Adds a dataset, if compatible with the existing ones.
- Parameters
dataset (pyradigm dataset or compatible) –
identifier (hashable) – String or integer or another hashable to uniquely identify this dataset
-
property
common_attr
¶ Attributes common to all subjects/datasets, such as covariates, in this MultiDataset
-
get_attr
(self, ds_id, attr_name, not_found_value='raise')¶ Method to retrieve modality-/dataset-specific attributes
-
get_common_attr
(self, names, subset, not_found_value=None)¶ Helper to retrieve the requested attributes common to all datasets.
-
get_subsets
(self, subset_list)¶ Returns the requested subsets of data while iterating over modalities
if subset_list were to contain two sets of IDs e.g. (train, test)
- return data would be this tuple:
(modality, (train_data, train_targets), (test_data, test_targets))
-
holdout
(self, train_perc=0.7, num_rep=50, stratified=True, return_ids_only=False, format='MLDataset')[source]¶ Builds a holdout generator for train and test sets for cross-validation. Ensures all the classes are represented equally in the training set.
- Parameters
train_perc (float) – Percentage (0, 1) of samplets from each class to be selected for the training set. Remaining IDs from each class will be added to test set.
num_rep (int) – Number of holdout repetitions
return_ids_only (bool) – Whether to return samplet IDs only, or the corresponding Datasets
format (str) – Format of the Dataset to be returned when return_ids_only=False format=’MLDataset’ returns the full-blown pyradigm data structure, and format=’data_matrix’ returns just the feature matrix X in ndarray format
- Returns
train, test – A tuple (in order train, test) of IDs or Datasets
- Return type
tuple
- Raises
ValueError – If train_perc is < 0 or > 1 If num_rep is not int, or < 1 or infinite
-
property
modality_ids
¶ List of identifiers for all modalities/datasets, sorted for reproducibility.
-
property
samplet_ids
¶ List of samplet IDs in the multi-dataset
-
set_attr
(self, ds_id, attr_name, attr_value)¶ Method to set modality-/dataset-specific attributes
-
property
target_set
¶ Set of targets/classes in this multi-dataset
-
property
target_sizes
¶ Sizes of targets in this classification dataset. Useful for summary and to compute chance accuracy.
-
class
pyradigm.multiple.
MultiDatasetRegress
(dataset_spec=None, name='MultiDatasetRegress')[source]¶ Bases:
pyradigm.multiple.BaseMultiDataset
Container class to manage multimodal regression datasets.
- Attributes
common_attr
Attributes common to all subjects/datasets, such as covariates, in this
modality_ids
List of identifiers for all modalities/datasets, sorted for reproducibility.
samplet_ids
List of samplet IDs in the multi-dataset
Methods
append
(self, dataset, identifier)Adds a dataset, if compatible with the existing ones.
get_attr
(self, ds_id, attr_name[, …])Method to retrieve modality-/dataset-specific attributes
get_common_attr
(self, names, subset[, …])Helper to retrieve the requested attributes common to all datasets.
get_subsets
(self, subset_list)Returns the requested subsets of data while iterating over modalities
holdout
(self[, train_perc, num_rep, …])Builds a holdout generator for train and test sets for cross-validation.
set_attr
(self, ds_id, attr_name, attr_value)Method to set modality-/dataset-specific attributes
-
append
(self, dataset, identifier)¶ Adds a dataset, if compatible with the existing ones.
- Parameters
dataset (pyradigm dataset or compatible) –
identifier (hashable) – String or integer or another hashable to uniquely identify this dataset
-
property
common_attr
¶ Attributes common to all subjects/datasets, such as covariates, in this MultiDataset
-
get_attr
(self, ds_id, attr_name, not_found_value='raise')¶ Method to retrieve modality-/dataset-specific attributes
-
get_common_attr
(self, names, subset, not_found_value=None)¶ Helper to retrieve the requested attributes common to all datasets.
-
get_subsets
(self, subset_list)¶ Returns the requested subsets of data while iterating over modalities
if subset_list were to contain two sets of IDs e.g. (train, test)
- return data would be this tuple:
(modality, (train_data, train_targets), (test_data, test_targets))
-
holdout
(self, train_perc=0.7, num_rep=50, return_ids_only=False, format='MLDataset')[source]¶ Builds a holdout generator for train and test sets for cross-validation.
- Parameters
train_perc (float) – Percentage (0, 1) of samplets to be selected for the training set. Remaining will be added to the test set.
num_rep (int) – Number of holdout repetitions
return_ids_only (bool) – Whether to return samplet IDs only, or the corresponding Datasets
format (str) – Format of the Dataset to be returned when return_ids_only=False format=’MLDataset’ returns the full-blown pyradigm data structure, and format=’data_matrix’ returns just the feature matrix X in ndarray format
- Returns
train, test – A tuple (in order train, test) of IDs or Datasets
- Return type
tuple
- Raises
ValueError – If train_perc is < 0 or > 1 If num_rep is not int, or < 1 or infinite
-
property
modality_ids
¶ List of identifiers for all modalities/datasets, sorted for reproducibility.
-
property
samplet_ids
¶ List of samplet IDs in the multi-dataset
-
set_attr
(self, ds_id, attr_name, attr_value)¶ Method to set modality-/dataset-specific attributes