hbw¶

Fast kernel bandwidth selection via analytic Hessian Newton optimization.

hbw provides optimal bandwidth selection for:

Kernel density estimation (KDE) via LSCV minimization
Nadaraya-Watson regression via LOOCV-MSE minimization

The key innovation is using closed-form analytic gradients and Hessians, enabling Newton optimization that converges in 6-12 evaluations vs 50-100 for grid search.

Installation¶

pip install hbw

Quick Start¶

import numpy as np
from hbw import kde_bandwidth, nw_bandwidth

# KDE bandwidth selection
x = np.random.randn(1000)
h = kde_bandwidth(x)

# Nadaraya-Watson bandwidth selection
x = np.linspace(-2, 2, 200)
y = np.sin(x) + 0.1 * np.random.randn(len(x))
h = nw_bandwidth(x, y)

API Reference¶

Fast kernel bandwidth selection via analytic Hessian Newton optimization.

This module provides optimal bandwidth selection for: - Univariate kernel density estimation (KDE) via LSCV minimization - Nadaraya-Watson regression via LOOCV-MSE minimization

The key innovation is using closed-form analytic gradients and Hessians of the cross-validation objectives, enabling Newton optimization that converges in 6-12 evaluations vs 50-100 for grid search.

hbw.kde_bandwidth(x, kernel='gauss', h0=None, max_n=5000, seed=None)¶

Select optimal KDE bandwidth via Newton-Armijo on LSCV.

Uses analytic gradients and Hessians for fast convergence (6-12 evaluations vs 50-100 for grid search).

Parameters:

x (TypeAliasType) – Sample data (1D array-like).
kernel (str) – Kernel function: “gauss” (Gaussian) or “epan” (Epanechnikov).
h0 (float | None) – Initial bandwidth guess. If None, uses Silverman’s rule.
max_n (int | None) – Maximum sample size for optimization. If len(x) > max_n, a random subsample is used. Set to None to disable subsampling.
seed (int | None) – Random seed for reproducible subsampling.

Returns:

Optimal bandwidth that minimizes the LSCV criterion.

Return type:

float

Examples

>>> import numpy as np
>>> x = np.random.randn(1000)
>>> h = kde_bandwidth(x)

hbw.loocv_mse(x, y, h, kernel='gauss')¶

Compute LOOCV MSE, gradient, and Hessian for NW bandwidth selection.

Parameters:

x (ndarray[tuple[Any, ...], dtype[Any]]) – Predictor values (1D array).
y (ndarray[tuple[Any, ...], dtype[Any]]) – Response values (1D array).
h (float) – Bandwidth.
kernel (str) – Kernel name: “gauss” or “epan”.

Returns:

(loss, gradient, hessian) of the LOOCV MSE objective.

Return type:

tuple[float, float, float]

hbw.lscv(x, h, kernel='gauss')¶

Compute LSCV score, gradient, and Hessian for KDE bandwidth selection.

Parameters:

x (ndarray[tuple[Any, ...], dtype[Any]]) – Sample data (1D array).
h (float) – Bandwidth.
kernel (str) – Kernel name: “gauss” or “epan”.

Returns:

(score, gradient, hessian) of the LSCV objective.

Return type:

tuple[float, float, float]

hbw.nw_bandwidth(x, y, kernel='gauss', h0=None, max_n=5000, seed=None)¶

Select optimal Nadaraya-Watson bandwidth via Newton-Armijo on LOOCV-MSE.

Uses analytic gradients and Hessians for fast convergence (6-12 evaluations vs 50-100 for grid search).

Parameters:

x (TypeAliasType) – Predictor values (1D array-like).
y (TypeAliasType) – Response values (1D array-like).
kernel (str) – Kernel function: “gauss” (Gaussian) or “epan” (Epanechnikov).
h0 (float | None) – Initial bandwidth guess. If None, uses Silverman’s rule on x.
max_n (int | None) – Maximum sample size for optimization. If len(x) > max_n, a random subsample is used. Set to None to disable subsampling.
seed (int | None) – Random seed for reproducible subsampling.

Returns:

Optimal bandwidth that minimizes the LOOCV MSE criterion.

Return type:

float

Examples

>>> import numpy as np
>>> x = np.linspace(-2, 2, 200)
>>> y = np.sin(x) + 0.1 * np.random.randn(len(x))
>>> h = nw_bandwidth(x, y)