hbw¶
Fast kernel bandwidth selection via analytic Hessian Newton optimization.
hbw provides optimal bandwidth selection for:
Kernel density estimation (KDE) via LSCV minimization
Nadaraya-Watson regression via LOOCV-MSE minimization
The key innovation is using closed-form analytic gradients and Hessians, enabling Newton optimization that converges in 6-12 evaluations vs 50-100 for grid search.
Installation¶
pip install hbw
Quick Start¶
import numpy as np
from hbw import kde_bandwidth, nw_bandwidth
# KDE bandwidth selection
x = np.random.randn(1000)
h = kde_bandwidth(x)
# Nadaraya-Watson bandwidth selection
x = np.linspace(-2, 2, 200)
y = np.sin(x) + 0.1 * np.random.randn(len(x))
h = nw_bandwidth(x, y)
API Reference¶
Fast kernel bandwidth selection via analytic Hessian Newton optimization.
This module provides optimal bandwidth selection for: - Univariate kernel density estimation (KDE) via LSCV minimization - Nadaraya-Watson regression via LOOCV-MSE minimization
The key innovation is using closed-form analytic gradients and Hessians of the cross-validation objectives, enabling Newton optimization that converges in 6-12 evaluations vs 50-100 for grid search.
- hbw.kde_bandwidth(x, kernel='gauss', h0=None, max_n=5000, seed=None)¶
Select optimal KDE bandwidth via Newton-Armijo on LSCV.
Uses analytic gradients and Hessians for fast convergence (6-12 evaluations vs 50-100 for grid search).
- Parameters:
x (
TypeAliasType) – Sample data (1D array-like).kernel (
str) – Kernel function: “gauss” (Gaussian) or “epan” (Epanechnikov).h0 (
float|None) – Initial bandwidth guess. If None, uses Silverman’s rule.max_n (
int|None) – Maximum sample size for optimization. If len(x) > max_n, a random subsample is used. Set to None to disable subsampling.seed (
int|None) – Random seed for reproducible subsampling.
- Returns:
Optimal bandwidth that minimizes the LSCV criterion.
- Return type:
float
Examples
>>> import numpy as np >>> x = np.random.randn(1000) >>> h = kde_bandwidth(x)
- hbw.loocv_mse(x, y, h, kernel='gauss')¶
Compute LOOCV MSE, gradient, and Hessian for NW bandwidth selection.
- Parameters:
x (
ndarray[tuple[Any,...],dtype[Any]]) – Predictor values (1D array).y (
ndarray[tuple[Any,...],dtype[Any]]) – Response values (1D array).h (
float) – Bandwidth.kernel (
str) – Kernel name: “gauss” or “epan”.
- Returns:
(loss, gradient, hessian) of the LOOCV MSE objective.
- Return type:
tuple[float,float,float]
- hbw.lscv(x, h, kernel='gauss')¶
Compute LSCV score, gradient, and Hessian for KDE bandwidth selection.
- Parameters:
x (
ndarray[tuple[Any,...],dtype[Any]]) – Sample data (1D array).h (
float) – Bandwidth.kernel (
str) – Kernel name: “gauss” or “epan”.
- Returns:
(score, gradient, hessian) of the LSCV objective.
- Return type:
tuple[float,float,float]
- hbw.nw_bandwidth(x, y, kernel='gauss', h0=None, max_n=5000, seed=None)¶
Select optimal Nadaraya-Watson bandwidth via Newton-Armijo on LOOCV-MSE.
Uses analytic gradients and Hessians for fast convergence (6-12 evaluations vs 50-100 for grid search).
- Parameters:
x (
TypeAliasType) – Predictor values (1D array-like).y (
TypeAliasType) – Response values (1D array-like).kernel (
str) – Kernel function: “gauss” (Gaussian) or “epan” (Epanechnikov).h0 (
float|None) – Initial bandwidth guess. If None, uses Silverman’s rule on x.max_n (
int|None) – Maximum sample size for optimization. If len(x) > max_n, a random subsample is used. Set to None to disable subsampling.seed (
int|None) – Random seed for reproducible subsampling.
- Returns:
Optimal bandwidth that minimizes the LOOCV MSE criterion.
- Return type:
float
Examples
>>> import numpy as np >>> x = np.linspace(-2, 2, 200) >>> y = np.sin(x) + 0.1 * np.random.randn(len(x)) >>> h = nw_bandwidth(x, y)