Drop-in Estimator classes¶
Besides being able to use the aforementioned KernelMatrix
in SVM or another kernel machine, this library makes life even easier by providing drop-in Estimator classes directly for use in scikit-learn. This interface is called KernelMachine
and it can be dropped in place of sklearn.svm.SVC
or another kernel machine of user choice anywhere an sklearn Estimator can be used. For example:
from kernelmethods import KernelMachine
km = KernelMachine(k_func=rbf)
km.fit(X=sample_data, y=labels)
predicted_y = km.predict(sample_data)
And if you’re not sure which kernel function is optimal for your dataset, you can simply employ the OptimalKernelSVR
which evaluates a large KernelBucket
and trains the SVR
estimator with the most optimal kernel for your sample. Using it is as easy as:
from kernelmethods import OptimalKernelSVR
opt_km = OptimalKernelSVR(k_bucket='exhaustive')
opt_km.fit(X=sample_data, y=labels)
predicted_y = opt_km.predict(sample_data)
See below for their API. Stay tuned for more tutorials, examples and comprehensive docs.
Kernel Machine (API)¶
-
class
kernelmethods.
KernelMachine
(k_func, learner_id='SVR')[source]¶ Bases:
sklearn.base.BaseEstimator
Generic class to return a drop-in sklearn estimator.
- Parameters
k_func (KernelFunction) – The kernel function the kernel machine bases itself on
learner_id (str) – Identifier for the estimator to be built based on the kernel function. Options:
SVM
andSVR
. Default:SVR
-
fit
(X, y, sample_weight=None)[source]¶ Fit the chosen Estimator based on the user-defined kernel.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of features. For kernel=”precomputed”, the expected shape of X is (n_samples, n_samples).
y (array-like, shape (n_samples,)) – Target values (class labels in classification, real numbers in regression)
sample_weight (array-like, shape (n_samples,)) – Per-sample weights. Rescale C per sample. Higher weights force the classifier to put more emphasis on these points.
- Returns
self
- Return type
object
Notes
If X and y are not C-ordered and contiguous arrays of np.float64 and X is not a scipy.sparse.csr_matrix, X and/or y may be copied.
If X is a dense array, then the other methods will not support sparse matrices as input.
-
predict
(X)[source]¶ Make predictions on the new samplets in X.
For an one-class model, +1 or -1 is returned.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – For kernel=”precomputed”, the expected shape of X is [n_samples_test, n_samples_train]
- Returns
y_pred – Class labels for samples in X.
- Return type
array, shape (n_samples,)
OptimalKernelSVR (API)¶
-
class
kernelmethods.
OptimalKernelSVR
(k_bucket='exhaustive', method='cv_risk')[source]¶ Bases:
sklearn.svm.classes.SVR
,sklearn.base.RegressorMixin
An estimator to learn the optimal kernel for a given sample and build a support vector regressor based on this custom kernel.
This class is wrapped around the sklearn SVR estimator to function as its drop-in replacement, whose implementation is in turn based on LIBSVM.
- Parameters
k_bucket (KernelBucket or str) – An instance of KernelBucket that contains all the kernels to be compared, or a string identifying the sampling_strategy which populates a KernelBucket.
-
support_
¶ Indices of support vectors.
- Type
array-like, shape = [n_SV]
-
support_vectors_
¶ Support vectors.
- Type
array-like, shape = [nSV, n_features]
-
dual_coef_
¶ Coefficients of the support vector in the decision function.
- Type
array, shape = [1, n_SV]
-
coef_
¶ Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.
coef_ is readonly property derived from dual_coef_ and support_vectors_.
- Type
array, shape = [1, n_features]
-
intercept_
¶ Constants in decision function.
- Type
array, shape = [1]
-
property
coef_
-
fit
(X, y, sample_weight=None)[source]¶ Estimate the optimal kernel, and fit a SVM based on the custom kernel.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Training vectors, where n_samples is the number of samples and n_features is the number of features. For kernel=”precomputed”, the expected shape of X is (n_samples, n_samples).
y (array-like, shape (n_samples,)) – Target values (class labels in classification, real numbers in regression)
sample_weight (array-like, shape (n_samples,)) – Per-sample weights. Rescale C per sample. Higher weights force the classifier to put more emphasis on these points.
- Returns
self
- Return type
object
Notes
If X and y are not C-ordered and contiguous arrays of np.float64 and X is not a scipy.sparse.csr_matrix, X and/or y may be copied.
If X is a dense array, then the other methods will not support sparse matrices as input.
-
predict
(X)[source]¶ Perform classification on samples in X.
For an one-class model, +1 or -1 is returned.
- Parameters
X ({array-like, sparse matrix}, shape (n_samples, n_features)) – For kernel=”precomputed”, the expected shape of X is [n_samples_test, n_samples_train]
- Returns
y_pred – Class labels for samples in X.
- Return type
array, shape (n_samples,)
-
score
(X, y, sample_weight=None)¶ Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
- Parameters
X (array-like, shape = (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix instead, shape = (n_samples, n_samples_fitted], where n_samples_fitted is the number of samples used in the fitting for the estimator.
y (array-like, shape = (n_samples) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like, shape = [n_samples], optional) – Sample weights.
- Returns
score – R^2 of self.predict(X) wrt. y.
- Return type
float
Notes
The R2 score used when calling
score
on a regressor will usemultioutput='uniform_average'
from version 0.23 to keep consistent with metrics.r2_score. This will influence thescore
method of all the multioutput regressors (except for multioutput.MultiOutputRegressor). To specify the default value manually and avoid the warning, please either call metrics.r2_score directly or make a custom scorer with metrics.make_scorer (the built-in scorer'r2'
usesmultioutput='uniform_average'
).