API for MultiDataset containers¶

MultiDataset comes in the following different variations to accommodate different types of targets

MultiDatasetClassify

MultiDatasetRegress

API for the above classes is shown below:

class pyradigm.multiple.MultiDatasetClassify(dataset_spec=None, name='MultiDatasetClassify', subgroup=None)[source]¶

Bases: pyradigm.multiple.BaseMultiDataset

Container class to manage multimodal classification datasets.

Attributes

common_attr: Attributes common to all subjects/datasets, such as covariates, in this
modality_ids: List of identifiers for all modalities/datasets, sorted for reproducibility.
samplet_ids: List of samplet IDs in the multi-dataset
target_set: Set of targets/classes in this multi-dataset
target_sizes: Sizes of targets in this classification dataset.

Methods

`append`(self, dataset, identifier)	Adds a dataset, if compatible with the existing ones.
`append_subgroup`(self, dataset, identifier, …)	Custom add method
`get_attr`(self, ds_id, attr_name[, …])	Method to retrieve modality-/dataset-specific attributes
`get_common_attr`(self, names, subset[, …])	Helper to retrieve the requested attributes common to all datasets.
`get_subsets`(self, subset_list)	Returns the requested subsets of data while iterating over modalities
`holdout`(self[, train_perc, num_rep, …])	Builds a holdout generator for train and test sets for cross-validation.
`set_attr`(self, ds_id, attr_name, attr_value)	Method to set modality-/dataset-specific attributes

append(self, dataset, identifier)¶

Adds a dataset, if compatible with the existing ones.

Parameters

dataset (pyradigm dataset or compatible) –
identifier (hashable) – String or integer or another hashable to uniquely identify this dataset

append_subgroup(self, dataset, identifier, subgroup)[source]¶: Custom add method

property common_attr¶: Attributes common to all subjects/datasets, such as covariates, in this MultiDataset

get_attr(self, ds_id, attr_name, not_found_value='raise')¶: Method to retrieve modality-/dataset-specific attributes

get_common_attr(self, names, subset, not_found_value=None)¶: Helper to retrieve the requested attributes common to all datasets.

get_subsets(self, subset_list)¶

Returns the requested subsets of data while iterating over modalities

if subset_list were to contain two sets of IDs e.g. (train, test)

return data would be this tuple:: (modality, (train_data, train_targets), (test_data, test_targets))

holdout(self, train_perc=0.7, num_rep=50, stratified=True, return_ids_only=False, format='MLDataset')[source]¶

Builds a holdout generator for train and test sets for cross-validation. Ensures all the classes are represented equally in the training set.

Parameters

train_perc (float) – Percentage (0, 1) of samplets from each class to be selected for the training set. Remaining IDs from each class will be added to test set.
num_rep (int) – Number of holdout repetitions
return_ids_only (bool) – Whether to return samplet IDs only, or the corresponding Datasets
format (str) – Format of the Dataset to be returned when return_ids_only=False format=’MLDataset’ returns the full-blown pyradigm data structure, and format=’data_matrix’ returns just the feature matrix X in ndarray format

Returns

train, test – A tuple (in order train, test) of IDs or Datasets

Return type

tuple

Raises

ValueError – If train_perc is < 0 or > 1 If num_rep is not int, or < 1 or infinite

property modality_ids¶: List of identifiers for all modalities/datasets, sorted for reproducibility.

property samplet_ids¶: List of samplet IDs in the multi-dataset

set_attr(self, ds_id, attr_name, attr_value)¶: Method to set modality-/dataset-specific attributes

property target_set¶: Set of targets/classes in this multi-dataset

property target_sizes¶: Sizes of targets in this classification dataset. Useful for summary and to compute chance accuracy.

class pyradigm.multiple.MultiDatasetRegress(dataset_spec=None, name='MultiDatasetRegress')[source]¶

Bases: pyradigm.multiple.BaseMultiDataset

Container class to manage multimodal regression datasets.

Attributes

common_attr: Attributes common to all subjects/datasets, such as covariates, in this
modality_ids: List of identifiers for all modalities/datasets, sorted for reproducibility.
samplet_ids: List of samplet IDs in the multi-dataset

Methods

`append`(self, dataset, identifier)	Adds a dataset, if compatible with the existing ones.
`get_attr`(self, ds_id, attr_name[, …])	Method to retrieve modality-/dataset-specific attributes
`get_common_attr`(self, names, subset[, …])	Helper to retrieve the requested attributes common to all datasets.
`get_subsets`(self, subset_list)	Returns the requested subsets of data while iterating over modalities
`holdout`(self[, train_perc, num_rep, …])	Builds a holdout generator for train and test sets for cross-validation.
`set_attr`(self, ds_id, attr_name, attr_value)	Method to set modality-/dataset-specific attributes

append(self, dataset, identifier)¶

Adds a dataset, if compatible with the existing ones.

Parameters

dataset (pyradigm dataset or compatible) –
identifier (hashable) – String or integer or another hashable to uniquely identify this dataset

property common_attr¶: Attributes common to all subjects/datasets, such as covariates, in this MultiDataset

get_attr(self, ds_id, attr_name, not_found_value='raise')¶: Method to retrieve modality-/dataset-specific attributes

get_common_attr(self, names, subset, not_found_value=None)¶: Helper to retrieve the requested attributes common to all datasets.

get_subsets(self, subset_list)¶

Returns the requested subsets of data while iterating over modalities

if subset_list were to contain two sets of IDs e.g. (train, test)

return data would be this tuple:: (modality, (train_data, train_targets), (test_data, test_targets))

holdout(self, train_perc=0.7, num_rep=50, return_ids_only=False, format='MLDataset')[source]¶

Builds a holdout generator for train and test sets for cross-validation.

Parameters

train_perc (float) – Percentage (0, 1) of samplets to be selected for the training set. Remaining will be added to the test set.
num_rep (int) – Number of holdout repetitions
return_ids_only (bool) – Whether to return samplet IDs only, or the corresponding Datasets
format (str) – Format of the Dataset to be returned when return_ids_only=False format=’MLDataset’ returns the full-blown pyradigm data structure, and format=’data_matrix’ returns just the feature matrix X in ndarray format

Returns

train, test – A tuple (in order train, test) of IDs or Datasets

Return type

tuple

Raises

ValueError – If train_perc is < 0 or > 1 If num_rep is not int, or < 1 or infinite

property modality_ids¶: List of identifiers for all modalities/datasets, sorted for reproducibility.

property samplet_ids¶: List of samplet IDs in the multi-dataset

set_attr(self, ds_id, attr_name, attr_value)¶: Method to set modality-/dataset-specific attributes