hessband documentation¶

Hessband: Analytic-Hessian bandwidth selection for univariate kernel smoothers.

This package provides tools for selecting bandwidths for Nadaraya–Watson regression and kernel density estimation (KDE) using analytic derivatives of cross-validation risk functions. It supports both leave-one-out cross-validation (LOOCV) for regression and least-squares cross-validation (LSCV) for density estimation.

Key Features¶

Analytic gradients and Hessians for efficient optimization
Multiple bandwidth selection methods (Newton, grid search, golden section, Bayesian)
Support for Gaussian and Epanechnikov kernels
Fast implementations with minimal cross-validation evaluations

Main Functions¶

select_nw_bandwidth : Select optimal bandwidth for Nadaraya-Watson regression select_kde_bandwidth : Select optimal bandwidth for kernel density estimation nw_predict : Make predictions using Nadaraya-Watson estimator lscv_generic : Compute LSCV score with analytic derivatives

Example

>>> import numpy as np
>>> from hessband import select_nw_bandwidth, nw_predict
>>> # Generate synthetic data
>>> X = np.linspace(0, 1, 200)
>>> y = np.sin(2 * np.pi * X) + 0.1 * np.random.randn(200)
>>> # Select bandwidth via analytic-Hessian method
>>> h_opt = select_nw_bandwidth(X, y, method='analytic')
>>> # Predict at new points
>>> y_pred = nw_predict(X, y, X, h_opt)

For KDE example: >>> from hessband import select_kde_bandwidth >>> x = np.random.normal(0, 1, 1000) >>> h_kde = select_kde_bandwidth(x, kernel=’gauss’, method=’analytic’)

hessband.analytic_newton(X, y, kernel, predict_fn, h_init, h_min=0.001, folds=5, tol=0.001, max_iter=10)[source]¶: Analytic Newton method for LOOCV risk minimisation. Returns the bandwidth without performing CV evaluations in the loop.

hessband.bayes_opt_bandwidth(X, y, kernel, predict_fn, a, b, folds=5, init_points=5, n_iter=10)[source]¶: Bayesian optimisation for bandwidth selection.

hessband.golden_section(X, y, kernel, predict_fn, a, b, folds=5, tol=0.001, max_iter=20)[source]¶: Golden-section search for bandwidth selection.

hessband.grid_search_cv(X, y, kernel, predict_fn, h_grid, folds=5)[source]¶: Grid search for the best bandwidth using cross-validation.

hessband.lscv_generic(x: ndarray, h: float, kernel: str)[source]¶

Least-squares cross-validation for univariate KDE with analytic gradient and Hessian with respect to h.

LSCV(h) = 1/(n^2 h) sum_{i,j} K2(u_ij)

2/(n(n-1) h) sum_{i != j} K(u_ij),

where u_ij = (x_i - x_j)/h and K2 is the kernel convolution K * K.

Returns:

score (float)
grad (float)
hess (float)

hessband.newton_fd(X, y, kernel, predict_fn, h_init, h_min=0.001, folds=5, tol=0.001, max_iter=10, eps=0.0001)[source]¶: Finite-difference Newton method for bandwidth selection.

hessband.nw_predict(X_train, y_train, X_test, h, kernel='gaussian')[source]¶: Compute Nadaraya–Watson predictions using a specified kernel.

hessband.plug_in_bandwidth(X)[source]¶: Plug-in bandwidth based on Silverman’s rule of thumb.

hessband.select_kde_bandwidth(x: ndarray, kernel: str = 'gauss', method: str = 'analytic', h_bounds=(0.01, 1.0), grid_size: int = 30, h_init: float | None = None) → float[source]¶

Select an optimal bandwidth for univariate kernel density estimation using LSCV.

This function minimizes the least-squares cross-validation (LSCV) criterion to select an optimal bandwidth for kernel density estimation. The analytic method uses exact gradients and Hessians for efficient Newton optimization.

Parameters:

x (array-like, shape (n_samples,)) – Data samples for density estimation.
kernel ({'gauss', 'epan'}, default='gauss') –
Kernel function:
- ’gauss’: Gaussian (normal) kernel
- ’epan’: Epanechnikov kernel (compact support)
method ({'analytic', 'grid', 'golden'}, default='analytic') –
Bandwidth selection method:
- ’analytic’: Newton–Armijo with analytic derivatives (recommended)
- ’grid’: Exhaustive grid search over h_bounds
- ’golden’: Golden-section search optimization
h_bounds (tuple of float, default=(0.01, 1.0)) – (min_bandwidth, max_bandwidth) search bounds.
grid_size (int, default=30) – Number of grid points for ‘grid’ method.
h_init (float, optional) – Initial bandwidth for Newton-based methods. If None, uses Silverman’s rule of thumb as starting point.

Returns:

Optimal bandwidth that minimizes LSCV criterion.

Return type:

float

Examples

>>> import numpy as np
>>> from hessband import select_kde_bandwidth
>>> # Generate sample data from mixture distribution
>>> x = np.concatenate([
...     np.random.normal(-2, 0.5, 200),
...     np.random.normal(2, 1.0, 300)
... ])
>>> # Select bandwidth using analytic method
>>> h_opt = select_kde_bandwidth(x, kernel='gauss', method='analytic')
>>> print(f"Optimal bandwidth: {h_opt:.4f}")

Notes

The LSCV criterion is defined as:

LSCV(h) = ∫ f̂ₕ²(x) dx - 2∫ f̂ₕ(x) f(x) dx

where f̂ₕ is the kernel density estimate with bandwidth h and f is the true (unknown) density. The analytic method provides exact derivatives, making optimization very efficient compared to finite-difference approaches.

hessband.select_nw_bandwidth(X, y, kernel='gaussian', method='analytic', folds=5, h_bounds=(0.01, 1.0), grid_size=30, init_bandwidth=None)[source]¶

Select the optimal bandwidth for Nadaraya–Watson regression.

This function provides a unified interface for various bandwidth selection methods for Nadaraya-Watson kernel regression. The analytic method uses gradients and Hessians of the cross-validation risk for efficient optimization.

Parameters:

X (array-like, shape (n_samples,)) – Input values (univariate predictor variable).
y (array-like, shape (n_samples,)) – Target values (response variable).
kernel ({'gaussian', 'epanechnikov'}, default='gaussian') – Kernel function to use for regression.
method ({'analytic', 'grid', 'plugin', 'newton_fd', 'golden', 'bayes'},) –
default=’analytic’ Bandwidth selection method:
- ’analytic’: Newton optimization with analytic gradients/Hessians (recommended)
- ’grid’: Exhaustive grid search over h_bounds
- ’plugin’: Simple plug-in rule (fastest but less accurate)
- ’newton_fd’: Newton optimization with finite-difference gradients
- ’golden’: Golden-section search optimization
- ’bayes’: Bayesian optimization (requires additional dependencies)
folds (int, default=5) – Number of folds for cross-validation (ignored for ‘plugin’ method).
h_bounds (tuple of float, default=(0.01, 1.0)) – (min_bandwidth, max_bandwidth) search bounds.
grid_size (int, default=30) – Number of grid points for ‘grid’ method.
init_bandwidth (float, optional) – Initial bandwidth for Newton-based methods. If None, uses plug-in rule.

Returns:

Optimal bandwidth that minimizes cross-validation risk.

Return type:

float

Examples

>>> import numpy as np
>>> from hessband import select_nw_bandwidth, nw_predict
>>> # Generate sample data
>>> X = np.linspace(0, 1, 100)
>>> y = np.sin(2 * np.pi * X) + 0.1 * np.random.randn(100)
>>> # Select bandwidth using analytic method
>>> h_opt = select_nw_bandwidth(X, y, method='analytic')
>>> # Make predictions
>>> y_pred = nw_predict(X, y, X, h_opt)

Notes

The ‘analytic’ method is generally recommended as it provides the accuracy of grid search while requiring minimal computational cost (no cross-validation evaluations during optimization).

Modules¶

hessband modules
- hessband package

hessband documentation¶

Key Features¶

Main Functions¶

Modules¶

Indices and tables¶