hessband documentation

Hessband: Analytic-Hessian bandwidth selection for univariate kernel smoothers.

This package provides tools for selecting bandwidths for Nadaraya–Watson regression and kernel density estimation (KDE) using analytic derivatives of cross-validation risk functions. It supports both leave-one-out cross-validation (LOOCV) for regression and least-squares cross-validation (LSCV) for density estimation.

Key Features

  • Analytic gradients and Hessians for efficient optimization

  • Multiple bandwidth selection methods (Newton, grid search, golden section, Bayesian)

  • Support for Gaussian and Epanechnikov kernels

  • Fast implementations with minimal cross-validation evaluations

Main Functions

select_nw_bandwidth : Select optimal bandwidth for Nadaraya-Watson regression select_kde_bandwidth : Select optimal bandwidth for kernel density estimation nw_predict : Make predictions using Nadaraya-Watson estimator lscv_generic : Compute LSCV score with analytic derivatives

Example

>>> import numpy as np
>>> from hessband import select_nw_bandwidth, nw_predict
>>> # Generate synthetic data
>>> X = np.linspace(0, 1, 200)
>>> y = np.sin(2 * np.pi * X) + 0.1 * np.random.randn(200)
>>> # Select bandwidth via analytic-Hessian method
>>> h_opt = select_nw_bandwidth(X, y, method='analytic')
>>> # Predict at new points
>>> y_pred = nw_predict(X, y, X, h_opt)

For KDE example: >>> from hessband import select_kde_bandwidth >>> x = np.random.normal(0, 1, 1000) >>> h_kde = select_kde_bandwidth(x, kernel=’gauss’, method=’analytic’)

hessband.analytic_newton(X, y, kernel, predict_fn, h_init, h_min=0.001, folds=5, tol=0.001, max_iter=10)[source]

Analytic Newton method for LOOCV risk minimisation. Returns the bandwidth without performing CV evaluations in the loop.

hessband.bayes_opt_bandwidth(X, y, kernel, predict_fn, a, b, folds=5, init_points=5, n_iter=10)[source]

Bayesian optimisation for bandwidth selection.

hessband.golden_section(X, y, kernel, predict_fn, a, b, folds=5, tol=0.001, max_iter=20)[source]

Golden-section search for bandwidth selection.

hessband.grid_search_cv(X, y, kernel, predict_fn, h_grid, folds=5)[source]

Grid search for the best bandwidth using cross-validation.

hessband.lscv_generic(x: ndarray, h: float, kernel: str)[source]

Least-squares cross-validation for univariate KDE with analytic gradient and Hessian with respect to h.

LSCV(h) = 1/(n^2 h) sum_{i,j} K2(u_ij)
  • 2/(n(n-1) h) sum_{i != j} K(u_ij),

where u_ij = (x_i - x_j)/h and K2 is the kernel convolution K * K.

Returns:

  • score (float)

  • grad (float)

  • hess (float)

hessband.newton_fd(X, y, kernel, predict_fn, h_init, h_min=0.001, folds=5, tol=0.001, max_iter=10, eps=0.0001)[source]

Finite-difference Newton method for bandwidth selection.

hessband.nw_predict(X_train, y_train, X_test, h, kernel='gaussian')[source]

Compute Nadaraya–Watson predictions using a specified kernel.

hessband.plug_in_bandwidth(X)[source]

Plug-in bandwidth based on Silverman’s rule of thumb.

hessband.select_kde_bandwidth(x: ndarray, kernel: str = 'gauss', method: str = 'analytic', h_bounds=(0.01, 1.0), grid_size: int = 30, h_init: float | None = None) float[source]

Select an optimal bandwidth for univariate kernel density estimation using LSCV.

This function minimizes the least-squares cross-validation (LSCV) criterion to select an optimal bandwidth for kernel density estimation. The analytic method uses exact gradients and Hessians for efficient Newton optimization.

Parameters:
  • x (array-like, shape (n_samples,)) – Data samples for density estimation.

  • kernel ({'gauss', 'epan'}, default='gauss') –

    Kernel function:

    • ’gauss’: Gaussian (normal) kernel

    • ’epan’: Epanechnikov kernel (compact support)

  • method ({'analytic', 'grid', 'golden'}, default='analytic') –

    Bandwidth selection method:

    • ’analytic’: Newton–Armijo with analytic derivatives (recommended)

    • ’grid’: Exhaustive grid search over h_bounds

    • ’golden’: Golden-section search optimization

  • h_bounds (tuple of float, default=(0.01, 1.0)) – (min_bandwidth, max_bandwidth) search bounds.

  • grid_size (int, default=30) – Number of grid points for ‘grid’ method.

  • h_init (float, optional) – Initial bandwidth for Newton-based methods. If None, uses Silverman’s rule of thumb as starting point.

Returns:

Optimal bandwidth that minimizes LSCV criterion.

Return type:

float

Examples

>>> import numpy as np
>>> from hessband import select_kde_bandwidth
>>> # Generate sample data from mixture distribution
>>> x = np.concatenate([
...     np.random.normal(-2, 0.5, 200),
...     np.random.normal(2, 1.0, 300)
... ])
>>> # Select bandwidth using analytic method
>>> h_opt = select_kde_bandwidth(x, kernel='gauss', method='analytic')
>>> print(f"Optimal bandwidth: {h_opt:.4f}")

Notes

The LSCV criterion is defined as:

LSCV(h) = ∫ f̂ₕ²(x) dx - 2∫ f̂ₕ(x) f(x) dx

where f̂ₕ is the kernel density estimate with bandwidth h and f is the true (unknown) density. The analytic method provides exact derivatives, making optimization very efficient compared to finite-difference approaches.

hessband.select_nw_bandwidth(X, y, kernel='gaussian', method='analytic', folds=5, h_bounds=(0.01, 1.0), grid_size=30, init_bandwidth=None)[source]

Select the optimal bandwidth for Nadaraya–Watson regression.

This function provides a unified interface for various bandwidth selection methods for Nadaraya-Watson kernel regression. The analytic method uses gradients and Hessians of the cross-validation risk for efficient optimization.

Parameters:
  • X (array-like, shape (n_samples,)) – Input values (univariate predictor variable).

  • y (array-like, shape (n_samples,)) – Target values (response variable).

  • kernel ({'gaussian', 'epanechnikov'}, default='gaussian') – Kernel function to use for regression.

  • method ({'analytic', 'grid', 'plugin', 'newton_fd', 'golden', 'bayes'},) –

    default=’analytic’ Bandwidth selection method:

    • ’analytic’: Newton optimization with analytic gradients/Hessians (recommended)

    • ’grid’: Exhaustive grid search over h_bounds

    • ’plugin’: Simple plug-in rule (fastest but less accurate)

    • ’newton_fd’: Newton optimization with finite-difference gradients

    • ’golden’: Golden-section search optimization

    • ’bayes’: Bayesian optimization (requires additional dependencies)

  • folds (int, default=5) – Number of folds for cross-validation (ignored for ‘plugin’ method).

  • h_bounds (tuple of float, default=(0.01, 1.0)) – (min_bandwidth, max_bandwidth) search bounds.

  • grid_size (int, default=30) – Number of grid points for ‘grid’ method.

  • init_bandwidth (float, optional) – Initial bandwidth for Newton-based methods. If None, uses plug-in rule.

Returns:

Optimal bandwidth that minimizes cross-validation risk.

Return type:

float

Examples

>>> import numpy as np
>>> from hessband import select_nw_bandwidth, nw_predict
>>> # Generate sample data
>>> X = np.linspace(0, 1, 100)
>>> y = np.sin(2 * np.pi * X) + 0.1 * np.random.randn(100)
>>> # Select bandwidth using analytic method
>>> h_opt = select_nw_bandwidth(X, y, method='analytic')
>>> # Make predictions
>>> y_pred = nw_predict(X, y, X, h_opt)

Notes

The ‘analytic’ method is generally recommended as it provides the accuracy of grid search while requiring minimal computational cost (no cross-validation evaluations during optimization).

Modules

Indices and tables