Drop-in Estimator classes

Besides being able to use the aforementioned KernelMatrix in SVM or another kernel machine, this library makes life even easier by providing drop-in Estimator classes directly for use in scikit-learn. This interface is called KernelMachine and it can be dropped in place of sklearn.svm.SVC or another kernel machine of user choice anywhere an sklearn Estimator can be used. For example:

from kernelmethods import KernelMachine
km = KernelMachine(k_func=rbf)
km.fit(X=sample_data, y=labels)
predicted_y = km.predict(sample_data)

And if you’re not sure which kernel function is optimal for your dataset, you can simply employ the OptimalKernelSVR which evaluates a large KernelBucket and trains the SVR estimator with the most optimal kernel for your sample. Using it is as easy as:

from kernelmethods import OptimalKernelSVR
opt_km = OptimalKernelSVR(k_bucket='exhaustive')
opt_km.fit(X=sample_data, y=labels)
predicted_y = opt_km.predict(sample_data)

See below for their API. Stay tuned for more tutorials, examples and comprehensive docs.

Kernel Machine (API)

class kernelmethods.KernelMachine(k_func, learner_id='SVR')[source]

Bases: sklearn.base.BaseEstimator

Generic class to return a drop-in sklearn estimator.

Parameters
  • k_func (KernelFunction) – The kernel function the kernel machine bases itself on

  • learner_id (str) – Identifier for the estimator to be built based on the kernel function. Options: SVM and SVR. Default: SVR

fit(X, y, sample_weight=None)[source]

Fit the chosen Estimator based on the user-defined kernel.

Parameters
  • X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of features. For kernel=”precomputed”, the expected shape of X is (n_samples, n_samples).

  • y (array-like, shape (n_samples,)) – Target values (class labels in classification, real numbers in regression)

  • sample_weight (array-like, shape (n_samples,)) – Per-sample weights. Rescale C per sample. Higher weights force the classifier to put more emphasis on these points.

Returns

self

Return type

object

Notes

If X and y are not C-ordered and contiguous arrays of np.float64 and X is not a scipy.sparse.csr_matrix, X and/or y may be copied.

If X is a dense array, then the other methods will not support sparse matrices as input.

get_params(deep=True)[source]

returns all the relevant parameters for this estimator!

predict(X)[source]

Make predictions on the new samplets in X.

For an one-class model, +1 or -1 is returned.

Parameters

X ({array-like, sparse matrix}, shape (n_samples, n_features)) – For kernel=”precomputed”, the expected shape of X is [n_samples_test, n_samples_train]

Returns

y_pred – Class labels for samples in X.

Return type

array, shape (n_samples,)

set_params(**parameters)[source]

Param setter

OptimalKernelSVR (API)

class kernelmethods.OptimalKernelSVR(k_bucket='exhaustive', method='cv_risk')[source]

Bases: sklearn.svm.classes.SVR, sklearn.base.RegressorMixin

An estimator to learn the optimal kernel for a given sample and build a support vector regressor based on this custom kernel.

This class is wrapped around the sklearn SVR estimator to function as its drop-in replacement, whose implementation is in turn based on LIBSVM.

Parameters

k_bucket (KernelBucket or str) – An instance of KernelBucket that contains all the kernels to be compared, or a string identifying the sampling_strategy which populates a KernelBucket.

support_

Indices of support vectors.

Type

array-like, shape = [n_SV]

support_vectors_

Support vectors.

Type

array-like, shape = [nSV, n_features]

dual_coef_

Coefficients of the support vector in the decision function.

Type

array, shape = [1, n_SV]

coef_

Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.

coef_ is readonly property derived from dual_coef_ and support_vectors_.

Type

array, shape = [1, n_features]

intercept_

Constants in decision function.

Type

array, shape = [1]

property coef_
fit(X, y, sample_weight=None)[source]

Estimate the optimal kernel, and fit a SVM based on the custom kernel.

Parameters
  • X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of features. For kernel=”precomputed”, the expected shape of X is (n_samples, n_samples).

  • y (array-like, shape (n_samples,)) – Target values (class labels in classification, real numbers in regression)

  • sample_weight (array-like, shape (n_samples,)) – Per-sample weights. Rescale C per sample. Higher weights force the classifier to put more emphasis on these points.

Returns

self

Return type

object

Notes

If X and y are not C-ordered and contiguous arrays of np.float64 and X is not a scipy.sparse.csr_matrix, X and/or y may be copied.

If X is a dense array, then the other methods will not support sparse matrices as input.

get_params(deep=True)[source]

returns all the relevant parameters for this estimator!

predict(X)[source]

Perform classification on samples in X.

For an one-class model, +1 or -1 is returned.

Parameters

X ({array-like, sparse matrix}, shape (n_samples, n_features)) – For kernel=”precomputed”, the expected shape of X is [n_samples_test, n_samples_train]

Returns

y_pred – Class labels for samples in X.

Return type

array, shape (n_samples,)

score(X, y, sample_weight=None)

Returns the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters
  • X (array-like, shape = (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like, shape = [n_samples], optional) – Sample weights.

Returns

score – R^2 of self.predict(X) wrt. y.

Return type

float

Notes

The R2 score used when calling score on a regressor will use multioutput='uniform_average' from version 0.23 to keep consistent with metrics.r2_score. This will influence the score method of all the multioutput regressors (except for multioutput.MultiOutputRegressor). To specify the default value manually and avoid the warning, please either call metrics.r2_score directly or make a custom scorer with metrics.make_scorer (the built-in scorer 'r2' uses multioutput='uniform_average').

set_params(**parameters)[source]

Param setter