MultiDataset¶
MultiDataset is a container data structure to hold and manage multiple MLDataset instances. pyradigm also offers two “meta” data structures that can hold multiple pyradigm MLDatasets in a more convenient and efficient way. The main purpose of these containers is to automatically perform checks for compatibility of a collection of Datasets, such as
ensuring same set of samplet IDs exist in all tables
they all link to same set of targets and attributes etc)
Such compatibility checks are often necessary when performing comparisons in machine learning e.g. cross-validation (CV). For example,
Uniform processing individual MLDatasets e.g. querying same set of IDs
ensuring correspondence across multiple datasets in cross-validation
reduce redundancy, improving integrity in linked tables as well as saving space and time
A schematic illustrating the function of the MultiDataset
is shown below, wherein a single MultiDataset
links and encapsulates 4 data tables X1
to X4
with the same set of targets y
and attributes A
:
Note
A tutorial jupyter notebook will be released soon. Stay tuned!