hessband package

Hessband: Analytic-Hessian bandwidth selection for univariate kernel smoothers.

This package provides tools for selecting bandwidths for Nadaraya–Watson regression using analytic derivatives of the leave-one-out cross-validation risk. The main entry point is select_nw_bandwidth, which returns an optimal bandwidth according to different optimisation strategies, including the analytic-Hessian method.

Example

>>> import numpy as np
>>> from hessband import select_nw_bandwidth, nw_predict
>>> # Generate synthetic data
>>> X = np.linspace(0, 1, 200)
>>> y = np.sin(2 * np.pi * X) + 0.1 * np.random.randn(200)
>>> # Select bandwidth via analytic-Hessian method
>>> h_opt = select_nw_bandwidth(X, y, method='analytic')
>>> # Predict at new points
>>> y_pred = nw_predict(X, y, X, h_opt)

hessband.select_nw_bandwidth(X, y, kernel='gaussian', method='analytic', folds=5, h_bounds=(0.01, 1.0), grid_size=30, init_bandwidth=None)[source]

Select the optimal bandwidth for Nadaraya–Watson regression.

Parameters:

X (array-like, shape (n_samples,)) – Input values.
y (array-like, shape (n_samples,)) – Target values.
kernel (str, optional (default='gaussian')) – Kernel type (‘gaussian’ or ‘epanechnikov’).
method (str, optional (default='analytic')) – Bandwidth selection method: one of {‘analytic’, ‘grid’, ‘plugin’, ‘newton_fd’, ‘golden’, ‘bayes’}.
folds (int, optional (default=5)) – Number of folds for cross-validation.
h_bounds (tuple, optional (default=(0.01, 1.0))) – Lower and upper bounds for the bandwidth search.
grid_size (int, optional (default=30)) – Number of grid points for grid search.
init_bandwidth (float, optional) – Initial bandwidth for Newton-based methods. If None, uses plug-in rule.

Returns:

Selected bandwidth.

Return type:

float

hessband.nw_predict(X_train, y_train, X_test, h, kernel='gaussian')[source]: Compute Nadaraya–Watson predictions using a specified kernel.

hessband.grid_search_cv(X, y, kernel, predict_fn, h_grid, folds=5)[source]: Grid search for the best bandwidth using cross-validation.

hessband.plug_in_bandwidth(X)[source]: Plug-in bandwidth based on Silverman’s rule of thumb.

hessband.newton_fd(X, y, kernel, predict_fn, h_init, h_min=0.001, folds=5, tol=0.001, max_iter=10, eps=0.0001)[source]: Finite-difference Newton method for bandwidth selection.

hessband.analytic_newton(X, y, kernel, predict_fn, h_init, h_min=0.001, folds=5, tol=0.001, max_iter=10)[source]: Analytic Newton method for LOOCV risk minimisation. Returns the bandwidth without performing CV evaluations in the loop.

hessband.golden_section(X, y, kernel, predict_fn, a, b, folds=5, tol=0.001, max_iter=20)[source]: Golden-section search for bandwidth selection.

hessband.bayes_opt_bandwidth(X, y, kernel, predict_fn, a, b, folds=5, init_points=5, n_iter=10)[source]: Bayesian optimisation for bandwidth selection.

hessband.select_kde_bandwidth(x, kernel='gauss', method='analytic', h_bounds=(0.01, 1.0), grid_size=30, h_init=None)[source]

Select an optimal bandwidth for univariate KDE using LSCV.

Parameters:

x (array-like) – Data samples.
kernel (str, optional) – Kernel name: ‘gauss’ or ‘epan’.
method (str, optional) – Selection method: ‘analytic’ (Newton–Armijo), ‘grid’, or ‘golden’.
h_bounds (tuple, optional) – Lower and upper bounds for the search.
grid_size (int, optional) – Number of grid points for grid search.
h_init (float, optional) – Initial bandwidth for Newton optimisation. Defaults to plug-in estimate.

Returns:

Selected bandwidth.

Return type:

float

hessband.lscv_generic(x, h, kernel)[source]

Return (LSCV, gradient, Hessian) at bandwidth h for the chosen kernel.

Return type:: Tuple[float, float, float]

Submodules

hessband.cv module

Cross-validation utilities for kernel regression and density estimation.

This module defines a CVScorer class that can be used to evaluate leave-one-out cross-validation (LOOCV) or K-fold cross-validation for kernel regression or density estimation.

class hessband.cv.CVScorer(X, y, folds=5, kernel='gaussian')[source]

Bases: object

Cross-validation scorer for kernel regression.

Parameters:

X (array-like, shape (n_samples,)) – Input values.
y (array-like, shape (n_samples,)) – Target values.
folds (int, optional (default=5)) – Number of folds for K-fold cross-validation.
kernel (str, optional (default='gaussian')) – Kernel type (‘gaussian’ or ‘epanechnikov’).

score(predict_fn, h)[source]

Compute the cross-validation MSE for a given bandwidth.

Parameters:

predict_fn (callable) – Function that takes (X_train, y_train, X_test, h, kernel) and returns predictions.
h (float) – Bandwidth value.

Returns:

Cross-validation mean squared error.

Return type:

float

hessband.kde module

Kernel density estimation (KDE) bandwidth selectors with analytic gradients.

This module implements leave‑one‑out least‑squares cross‑validation (LSCV) for univariate KDE with Gaussian and Epanechnikov kernels. It provides analytic expressions for the cross‑validation score, gradient and Hessian with respect to the bandwidth. A Newton–Armijo optimiser is included to select the optimal bandwidth without numerical differencing.

The analytic formulas are based on convolution of kernels and their derivatives; see the accompanying paper for details.

hessband.kde.select_kde_bandwidth(x, kernel='gauss', method='analytic', h_bounds=(0.01, 1.0), grid_size=30, h_init=None)[source]

Select an optimal bandwidth for univariate KDE using LSCV.

Parameters:

x (array-like) – Data samples.
kernel (str, optional) – Kernel name: ‘gauss’ or ‘epan’.
method (str, optional) – Selection method: ‘analytic’ (Newton–Armijo), ‘grid’, or ‘golden’.
h_bounds (tuple, optional) – Lower and upper bounds for the search.
grid_size (int, optional) – Number of grid points for grid search.
h_init (float, optional) – Initial bandwidth for Newton optimisation. Defaults to plug-in estimate.

Returns:

Selected bandwidth.

Return type:

float

hessband.kde.lscv_generic(x, h, kernel)[source]

Return (LSCV, gradient, Hessian) at bandwidth h for the chosen kernel.

Return type:: Tuple[float, float, float]

hessband.kde.lscv_gauss(x, h)[source]: LSCV for Gaussian kernel.

hessband.kde.lscv_epan(x, h)[source]: LSCV for Epanechnikov kernel.

hessband.kernels module

Kernel functions and derivatives for univariate smoothing.

This module provides routines to compute kernel weights and their first and second derivatives with respect to the bandwidth for Gaussian and Epanechnikov kernels.

These functions are used internally by the analytic-Hessian bandwidth selector.

hessband.kernels.weights_gaussian(u, h)[source]: Return Gaussian weights for scaled distances u and bandwidth h.

hessband.kernels.weights_epanechnikov(u, h)[source]: Return Epanechnikov weights for scaled distances u and bandwidth h.

hessband.kernels.kernel_weights(u, h, kernel='gaussian')[source]: Dispatch to the appropriate kernel weight function.

hessband.kernels.kernel_derivatives(u, h, kernel)[source]

Compute the first and second derivatives of kernel weights with respect to the bandwidth.

Returns a tuple (w, d_w, dd_w), where w are the weights, d_w the first derivative and dd_w the second derivative. and dd_w the second derivative. The derivatives are computed analytically for the Gaussian and Epanechnikov kernels.

hessband.selectors module

Bandwidth selection methods for univariate kernel regression.

This module provides several bandwidth selectors, including grid search, plug-in rules, finite-difference Newton, analytic Newton (analytic-Hessian), golden-section search and Bayesian optimisation. A high-level function select_nw_bandwidth orchestrates the selection process.

hessband.selectors.select_nw_bandwidth(X, y, kernel='gaussian', method='analytic', folds=5, h_bounds=(0.01, 1.0), grid_size=30, init_bandwidth=None)[source]

Select the optimal bandwidth for Nadaraya–Watson regression.

Parameters:

X (array-like, shape (n_samples,)) – Input values.
y (array-like, shape (n_samples,)) – Target values.
kernel (str, optional (default='gaussian')) – Kernel type (‘gaussian’ or ‘epanechnikov’).
method (str, optional (default='analytic')) – Bandwidth selection method: one of {‘analytic’, ‘grid’, ‘plugin’, ‘newton_fd’, ‘golden’, ‘bayes’}.
folds (int, optional (default=5)) – Number of folds for cross-validation.
h_bounds (tuple, optional (default=(0.01, 1.0))) – Lower and upper bounds for the bandwidth search.
grid_size (int, optional (default=30)) – Number of grid points for grid search.
init_bandwidth (float, optional) – Initial bandwidth for Newton-based methods. If None, uses plug-in rule.

Returns:

Selected bandwidth.

Return type:

float

hessband.selectors.nw_predict(X_train, y_train, X_test, h, kernel='gaussian')[source]: Compute Nadaraya–Watson predictions using a specified kernel.

hessband.selectors.grid_search_cv(X, y, kernel, predict_fn, h_grid, folds=5)[source]: Grid search for the best bandwidth using cross-validation.

hessband.selectors.plug_in_bandwidth(X)[source]: Plug-in bandwidth based on Silverman’s rule of thumb.

hessband.selectors.newton_fd(X, y, kernel, predict_fn, h_init, h_min=0.001, folds=5, tol=0.001, max_iter=10, eps=0.0001)[source]: Finite-difference Newton method for bandwidth selection.

hessband.selectors.analytic_newton(X, y, kernel, predict_fn, h_init, h_min=0.001, folds=5, tol=0.001, max_iter=10)[source]: Analytic Newton method for LOOCV risk minimisation. Returns the bandwidth without performing CV evaluations in the loop.

hessband.selectors.golden_section(X, y, kernel, predict_fn, a, b, folds=5, tol=0.001, max_iter=20)[source]: Golden-section search for bandwidth selection.

hessband.selectors.bayes_opt_bandwidth(X, y, kernel, predict_fn, a, b, folds=5, init_points=5, n_iter=10)[source]: Bayesian optimisation for bandwidth selection.