Kernel density estimation (KDE) bandwidth selectors with analytic gradients.
This module implements leave‑one‑out least‑squares cross‑validation (LSCV) for univariate KDE with Gaussian and Epanechnikov kernels. It provides analytic expressions for the cross‑validation score, gradient and Hessian with respect to the bandwidth. A Newton–Armijo optimizer is included to select the optimal bandwidth without numerical differencing.
The analytic formulas are based on convolution of kernels and their derivatives; see the accompanying paper for details.
- hessband.kde.lscv_generic(x: ndarray, h: float, kernel: str) tuple[float, float, float][source]¶
Least-squares cross-validation for univariate KDE.
Calculates the LSCV score, gradient, and Hessian with respect to the bandwidth h for univariate kernel density estimation.
The LSCV criterion is defined as: LSCV(h) = 1/(n^2 h) sum_{i,j} K2(u_ij) - 2/(n(n-1) h) sum_{i != j} K(u_ij), where u_ij = (x_i - x_j)/h and K2 is the kernel convolution K * K.
- Parameters:
x – The input data points.
h – The bandwidth.
kernel – The kernel to use (‘gauss’ or ‘epan’).
- Returns:
score: The LSCV score.
grad: The gradient of the LSCV score with respect to h.
hess: The Hessian of the LSCV score with respect to h.
- Return type:
A tuple containing
- hessband.kde.select_kde_bandwidth(x: ndarray, kernel: str = 'gauss', method: str = 'analytic', h_bounds=(0.01, 1.0), grid_size: int = 30, h_init: float | None = None) float[source]¶
Select an optimal bandwidth for univariate kernel density estimation using LSCV.
This function minimizes the least-squares cross-validation (LSCV) criterion to select an optimal bandwidth for kernel density estimation. The analytic method uses exact gradients and Hessians for efficient Newton optimization.
- Parameters:
x (array-like, shape (n_samples,)) – Data samples for density estimation.
kernel ({'gauss', 'epan'}, default='gauss') –
Kernel function:
’gauss’: Gaussian (normal) kernel
’epan’: Epanechnikov kernel (compact support)
method ({'analytic', 'grid', 'golden'}, default='analytic') –
Bandwidth selection method:
’analytic’: Newton–Armijo with analytic derivatives (recommended)
’grid’: Exhaustive grid search over h_bounds
’golden’: Golden-section search optimization
h_bounds (tuple of float, default=(0.01, 1.0)) – (min_bandwidth, max_bandwidth) search bounds.
grid_size (int, default=30) – Number of grid points for ‘grid’ method.
h_init (float, optional) – Initial bandwidth for Newton-based methods. If None, uses Silverman’s rule of thumb as starting point.
- Returns:
Optimal bandwidth that minimizes LSCV criterion.
- Return type:
float
Examples
>>> import numpy as np >>> from hessband import select_kde_bandwidth >>> # Generate sample data from mixture distribution >>> x = np.concatenate([ ... np.random.normal(-2, 0.5, 200), ... np.random.normal(2, 1.0, 300) ... ]) >>> # Select bandwidth using analytic method >>> h_opt = select_kde_bandwidth(x, kernel='gauss', method='analytic') >>> print(f"Optimal bandwidth: {h_opt:.4f}")
Notes
The LSCV criterion is defined as:
LSCV(h) = ∫ f̂ₕ²(x) dx - 2∫ f̂ₕ(x) f(x) dx
where f̂ₕ is the kernel density estimate with bandwidth h and f is the true (unknown) density. The analytic method provides exact derivatives, making optimization very efficient compared to finite-difference approaches.