Mathematical Foundations and Methods ==================================== This document provides comprehensive information about the mathematical foundations, methods, and best practices for the incline package. Core Mathematical Problem ------------------------- The package addresses the problem of estimating the instantaneous rate of change (derivative) at a point :math:`t_0` in a noisy time series: .. math:: y_i = f(t_i) + \epsilon_i where :math:`f` is the underlying smooth function and :math:`\epsilon_i` is noise. We want to estimate :math:`f'(t_0)` along with confidence intervals. Available Methods and Capabilities ---------------------------------- The incline package provides a comprehensive suite of trend estimation methods: 1. **Basic Methods** - **Naive (Central Differences):** Simple finite differences for clean data - **Savitzky-Golay Filtering:** Local polynomial fitting with uniform sampling - **Spline Interpolation:** Smooth curve fitting with automatic time scaling 2. **Advanced Nonparametric Methods** - **LOESS/LOWESS:** Locally weighted scatterplot smoothing with robust options - **Local Polynomial Regression:** Kernel-weighted polynomial fits - **L1 Trend Filtering:** Piecewise linear trends with changepoint detection 3. **Bayesian and State-Space Methods** - **Gaussian Process Regression:** Full posterior distribution with principled uncertainty - **Kalman Filtering:** Local linear trend models with adaptive parameters - **Structural Time Series:** Seasonal decomposition with state-space modeling 4. **Multiscale Analysis** - **SiZer Maps:** Significance of trends across multiple smoothing scales - **Adaptive Methods:** Time-varying parameters for non-stationary series 5. **Seasonal and Robust Methods** - **STL Decomposition:** Seasonal-trend decomposition using Loess - **Robust Statistics:** Outlier-resistant trend ranking and aggregation - **Bootstrap Confidence Intervals:** Non-parametric uncertainty quantification Key Features and Improvements ----------------------------- 1. **Automatic Time Scaling** ✅ All methods now handle: - DateTime indices with proper scaling - Irregular sampling (where applicable) - Custom time columns - Automatic detection of time units .. code-block:: python # Automatic time handling result = spline_trend(df) # Uses datetime index result = gp_trend(df, time_column='timestamp') # Custom time column 2. **Parameter Selection** ✅ - **Cross-validation:** ``select_smoothing_parameter_cv()`` - **Automatic selection:** Built into GP and Kalman methods - **Adaptive methods:** ``adaptive_gp_trend()``, ``adaptive_kalman_trend()`` .. code-block:: python # Automatic parameter selection best_s, cv_scores = select_smoothing_parameter_cv(df, method='spline') result = gp_trend(df) # Auto-optimizes hyperparameters 3. **Comprehensive Uncertainty Quantification** ✅ - **Bootstrap confidence intervals:** All basic methods - **Bayesian posteriors:** Gaussian Process methods - **Kalman uncertainty:** State-space models - **Significance testing:** SiZer analysis .. code-block:: python # Multiple uncertainty quantification approaches result = bootstrap_derivative_ci(df, n_bootstrap=200) result = gp_trend(df, confidence_level=0.95) result = kalman_trend(df) # Natural uncertainty from Kalman filter 4. **Robust to Irregular Sampling** ✅ Most methods handle irregular sampling: - Splines, LOESS, GP, Kalman: Native support - Savitzky-Golay: Requires regular sampling (automatically detected) 5. **Multiscale Significance Analysis** ✅ .. code-block:: python # SiZer analysis across scales sizer = sizer_analysis(df, n_bandwidths=20) features = sizer.find_significant_features() # Combined trend + significance result = trend_with_sizer(df, trend_method='loess') Mathematical Details -------------------- Savitzky-Golay Derivatives ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The filter fits a polynomial of degree :math:`p` using least squares over a window of size :math:`2m+1`: .. math:: \hat{f}(t) = \sum_{j=0}^{p} a_j t^j The derivative is: .. math:: \hat{f}'(t_0) = \sum_{j=1}^{p} j \cdot a_j t_0^{j-1} **Key requirement:** For :math:`k`-th derivative, need :math:`p \geq k` and window size :math:`\geq p + 1`. **Scaling:** Derivative must be divided by :math:`\Delta t^k` where :math:`\Delta t` is the time step. Spline Derivatives ~~~~~~~~~~~~~~~~~~ Cubic splines minimize: .. math:: \sum_{i=1}^{n} w_i (y_i - f(x_i))^2 + \lambda \int [f''(x)]^2 dx subject to :math:`\sum w_i (y_i - f(x_i))^2 \leq s`. The derivative is obtained analytically from the spline coefficients. **Advantage:** Handles irregular sampling naturally. **Challenge:** Choosing :math:`s` requires domain knowledge or cross-validation. Naive Method (Central Differences) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. math:: f'(t_i) \approx \frac{y_{i+1} - y_{i-1}}{2\Delta t} **Pros:** Simple, unbiased for linear trends **Cons:** High variance, sensitive to noise, poor at boundaries Autocorrelation and Serial Dependence -------------------------------------- Time series typically have autocorrelated errors: .. math:: \epsilon_t = \rho \epsilon_{t-1} + \eta_t This violates independence assumptions and means: 1. Standard errors are underestimated 2. Confidence intervals are too narrow 3. Significance tests are invalid **Solution:** Use block bootstrap or model the autocorrelation explicitly. Best Practices -------------- 1. **Always specify time units** .. code-block:: python # BAD: Assumes unit time result = spline_trend(df) # GOOD: Explicit time handling result = improved_spline_trend(df, time_column='date') 2. **Check sampling regularity** .. code-block:: python time_diffs = df.index.to_series().diff() if time_diffs.std() / time_diffs.mean() > 0.1: print("Warning: Irregular sampling detected") # Use splines, not Savitzky-Golay 3. **Validate smoothing parameters** .. code-block:: python # Use cross-validation best_s, cv_results = select_smoothing_parameter_cv( df, param_name='s', method='spline' ) 4. **Quantify uncertainty** .. code-block:: python # Get confidence intervals result = bootstrap_derivative_ci( df, method='spline', n_bootstrap=100 ) # Check if trend is significant significant = result['significant_trend'] 5. **Handle seasonality** For seasonal data, consider: - Pre-deseasonalizing with STL decomposition - Using longer smoothing windows (> seasonal period) - Fitting seasonal models explicitly 6. **Be cautious at boundaries** .. code-block:: python # Mark unreliable edge estimates window = 15 result['reliable'] = True result.iloc[:window//2, 'reliable'] = False result.iloc[-window//2:, 'reliable'] = False Alternative Approaches ---------------------- For more robust trend estimation, consider: 1. **Local polynomial regression (LOESS)** - More flexible than Savitzky-Golay - Better edge handling - Available in statsmodels 2. **State-space models** - Explicit modeling of trend component - Natural uncertainty quantification - Handles missing data 3. **Gaussian processes** - Full posterior distribution for derivatives - Principled uncertainty quantification - Heavy computationally 4. **L1 trend filtering** - Piecewise linear trends - Automatic changepoint detection - Robust to outliers References ---------- 1. Savitzky, A., & Golay, M. J. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical chemistry, 36(8), 1627-1639. 2. De Boor, C. (1978). A practical guide to splines. Springer-Verlag. 3. Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its applications. Chapman and Hall. 4. Kim, S. J., Koh, K., Boyd, S., & Gorinevsky, D. (2009). ℓ1 trend filtering. SIAM review, 51(2), 339-360.