Utilities

Here, we document several important utilities related to this library, including sampling and ranking.

Sampling

kernelmethods.sampling.correlation_km(k1, k2)[source]

Computes [pearson] correlation coefficient between two kernel matrices

Parameters

k2 (k1,) – Two kernel matrices of the same size

Returns

corr_coef – Correlation coefficient between the vectorized kernel matrices

Return type

float

kernelmethods.sampling.ideal_kernel(targets)[source]

Computes the kernel matrix from the given target labels.

Parameters

targets (Iterable) – Target values (y) to compute the ideal kernel from.

Returns

ideal_kernel – The ideal kernel from (``yyT ``)

Return type

ndarray

kernelmethods.sampling.make_kernel_bucket(strategy='exhaustive', normalize_kernels=True, skip_input_checks=False)[source]

Generates a candidate kernels based on user preferences.

Parameters
  • strategy (str) – Name of the strategy for populating the kernel bucket. Options: ‘exhaustive’ and ‘light’. Default: ‘exhaustive’

  • normalize_kernels (bool) – Flag to indicate whether to normalize the kernel matrices

  • skip_input_checks (bool) – Flag to indicate whether checks on input data (type, format etc) can be skipped. This helps save a tiny bit of runtime for expert uses when data types and formats are managed thoroughly in numpy. Default: False. Disable this only when you know exactly what you’re doing!

Returns

kb – Kernel bucket populated according to the requested strategy

Return type

KernelBucket

kernelmethods.sampling.pairwise_similarity(k_bucket, metric='corr')[source]

Computes the similarity between all pairs of kernel matrices in a given bucket.

Parameters
  • k_bucket (KernelBucket) – Container of length num_km, with each an instance KernelMatrix

  • metric (str) – Identifies the metric to be used. Options: corr (correlation coefficient) and align (centered alignment).

Returns

pairwise_metric – A symmetric matrix computing the pairwise similarity between the various kernel matrices

Return type

ndarray of shape (num_km, num_km)

Ranking

Module gathering techniques and helpers to rank kernels using various methods and metrics, such as

  • their target alignment,

  • performance in cross-validation

kernelmethods.ranking.CV_ranking(kernel_bucket, targets, num_folds=3, estimator_name='SVM')[source]

Ranks kernels by their performance measured via cross-validation (CV).

Parameters
  • kernel_bucket (KernelBucket) –

  • targets (Iterable) – target values of the sample attached to the bucket

  • num_folds (int) – Number of folds for the CV to be employed

  • estimator_name (str) – Name of a valid Scikit-Learn estimator. Default: SVM

Returns

scores – CV performance computed for the kernel matrices in the bucket

Return type

ndarray

kernelmethods.ranking.alignment_ranking(kernel_bucket, targets, **method_params)[source]

Method to rank kernels that depend on target alignment.

kernelmethods.ranking.find_optimal_kernel(kernel_bucket, sample, targets, method='align/corr', **method_params)[source]

Finds the optimal kernel for the current sample given their labels.

Parameters
  • kernel_bucket (KernelBucket) – The collection of kernels to evaluate and rank

  • sample (ndarray) – The dataset given kernel bucket to be evaluated on

  • targets (ndarray) – Target labels for each point in the sample dataset

  • method (str) – identifier for the metric to choose to rank the kernels

Returns

km – Instance of KernelMatrix with the optimal kernel function

Return type

KernelMatrix

kernelmethods.ranking.get_estimator(learner_id='svm')[source]

Returns a valid kernel machine to become the base learner of the MKL methods.

Base learner must be able to accept a precomputed kernel for fit/predict methods!

Parameters

learner_id (str) – Identifier for the estimator to be chosen. Options: SVM and SVR. Default: SVM

Returns

  • base_learner (Estimator) – An sklearn estimator

  • param_grid (dict) – Parameter grid (sklearn format) for the chosen estimator.

kernelmethods.ranking.rank_kernels(kernel_bucket, targets, method='align/corr', **method_params)[source]

Computes a given ranking metric for all the kernel matrices in the bucket.

Choices for the method include: “align/corr”, “cv_risk”

Parameters
  • kernel_bucket (KernelBucket) –

  • targets (Iterable) – target values of the sample attached to the bucket

  • method (str) – Identifies one of the metrics: align/corr, cv_risk

  • method_params (dict) – Additional parameters to be passed on to the method chosen above.

Returns

scores – Values of the ranking metrics computed for the kernel matrices in the bucket

Return type

ndarray