API for MultiDataset containers

MultiDataset comes in the following different variations to accommodate different types of targets

  • MultiDatasetClassify

  • MultiDatasetRegress

API for the above classes is shown below:

class pyradigm.multiple.MultiDatasetClassify(dataset_spec=None, name='MultiDatasetClassify', subgroup=None)[source]

Bases: pyradigm.multiple.BaseMultiDataset

Container class to manage multimodal classification datasets.

Attributes
common_attr

Attributes common to all subjects/datasets, such as covariates, in this

modality_ids

List of identifiers for all modalities/datasets, sorted for reproducibility.

samplet_ids

List of samplet IDs in the multi-dataset

target_set

Set of targets/classes in this multi-dataset

target_sizes

Sizes of targets in this classification dataset.

Methods

append(self, dataset, identifier)

Adds a dataset, if compatible with the existing ones.

append_subgroup(self, dataset, identifier, …)

Custom add method

get_attr(self, ds_id, attr_name[, …])

Method to retrieve modality-/dataset-specific attributes

get_common_attr(self, names, subset[, …])

Helper to retrieve the requested attributes common to all datasets.

get_subsets(self, subset_list)

Returns the requested subsets of data while iterating over modalities

holdout(self[, train_perc, num_rep, …])

Builds a holdout generator for train and test sets for cross-validation.

set_attr(self, ds_id, attr_name, attr_value)

Method to set modality-/dataset-specific attributes

append(self, dataset, identifier)

Adds a dataset, if compatible with the existing ones.

Parameters
  • dataset (pyradigm dataset or compatible) –

  • identifier (hashable) – String or integer or another hashable to uniquely identify this dataset

append_subgroup(self, dataset, identifier, subgroup)[source]

Custom add method

property common_attr

Attributes common to all subjects/datasets, such as covariates, in this MultiDataset

get_attr(self, ds_id, attr_name, not_found_value='raise')

Method to retrieve modality-/dataset-specific attributes

get_common_attr(self, names, subset, not_found_value=None)

Helper to retrieve the requested attributes common to all datasets.

get_subsets(self, subset_list)

Returns the requested subsets of data while iterating over modalities

if subset_list were to contain two sets of IDs e.g. (train, test)

return data would be this tuple:

(modality, (train_data, train_targets), (test_data, test_targets))

holdout(self, train_perc=0.7, num_rep=50, stratified=True, return_ids_only=False, format='MLDataset')[source]

Builds a holdout generator for train and test sets for cross-validation. Ensures all the classes are represented equally in the training set.

Parameters
  • train_perc (float) – Percentage (0, 1) of samplets from each class to be selected for the training set. Remaining IDs from each class will be added to test set.

  • num_rep (int) – Number of holdout repetitions

  • return_ids_only (bool) – Whether to return samplet IDs only, or the corresponding Datasets

  • format (str) – Format of the Dataset to be returned when return_ids_only=False format=’MLDataset’ returns the full-blown pyradigm data structure, and format=’data_matrix’ returns just the feature matrix X in ndarray format

Returns

train, test – A tuple (in order train, test) of IDs or Datasets

Return type

tuple

Raises

ValueError – If train_perc is < 0 or > 1 If num_rep is not int, or < 1 or infinite

property modality_ids

List of identifiers for all modalities/datasets, sorted for reproducibility.

property samplet_ids

List of samplet IDs in the multi-dataset

set_attr(self, ds_id, attr_name, attr_value)

Method to set modality-/dataset-specific attributes

property target_set

Set of targets/classes in this multi-dataset

property target_sizes

Sizes of targets in this classification dataset. Useful for summary and to compute chance accuracy.

class pyradigm.multiple.MultiDatasetRegress(dataset_spec=None, name='MultiDatasetRegress')[source]

Bases: pyradigm.multiple.BaseMultiDataset

Container class to manage multimodal regression datasets.

Attributes
common_attr

Attributes common to all subjects/datasets, such as covariates, in this

modality_ids

List of identifiers for all modalities/datasets, sorted for reproducibility.

samplet_ids

List of samplet IDs in the multi-dataset

Methods

append(self, dataset, identifier)

Adds a dataset, if compatible with the existing ones.

get_attr(self, ds_id, attr_name[, …])

Method to retrieve modality-/dataset-specific attributes

get_common_attr(self, names, subset[, …])

Helper to retrieve the requested attributes common to all datasets.

get_subsets(self, subset_list)

Returns the requested subsets of data while iterating over modalities

holdout(self[, train_perc, num_rep, …])

Builds a holdout generator for train and test sets for cross-validation.

set_attr(self, ds_id, attr_name, attr_value)

Method to set modality-/dataset-specific attributes

append(self, dataset, identifier)

Adds a dataset, if compatible with the existing ones.

Parameters
  • dataset (pyradigm dataset or compatible) –

  • identifier (hashable) – String or integer or another hashable to uniquely identify this dataset

property common_attr

Attributes common to all subjects/datasets, such as covariates, in this MultiDataset

get_attr(self, ds_id, attr_name, not_found_value='raise')

Method to retrieve modality-/dataset-specific attributes

get_common_attr(self, names, subset, not_found_value=None)

Helper to retrieve the requested attributes common to all datasets.

get_subsets(self, subset_list)

Returns the requested subsets of data while iterating over modalities

if subset_list were to contain two sets of IDs e.g. (train, test)

return data would be this tuple:

(modality, (train_data, train_targets), (test_data, test_targets))

holdout(self, train_perc=0.7, num_rep=50, return_ids_only=False, format='MLDataset')[source]

Builds a holdout generator for train and test sets for cross-validation.

Parameters
  • train_perc (float) – Percentage (0, 1) of samplets to be selected for the training set. Remaining will be added to the test set.

  • num_rep (int) – Number of holdout repetitions

  • return_ids_only (bool) – Whether to return samplet IDs only, or the corresponding Datasets

  • format (str) – Format of the Dataset to be returned when return_ids_only=False format=’MLDataset’ returns the full-blown pyradigm data structure, and format=’data_matrix’ returns just the feature matrix X in ndarray format

Returns

train, test – A tuple (in order train, test) of IDs or Datasets

Return type

tuple

Raises

ValueError – If train_perc is < 0 or > 1 If num_rep is not int, or < 1 or infinite

property modality_ids

List of identifiers for all modalities/datasets, sorted for reproducibility.

property samplet_ids

List of samplet IDs in the multi-dataset

set_attr(self, ds_id, attr_name, attr_value)

Method to set modality-/dataset-specific attributes