Mathematical Foundations and Methods

This document provides comprehensive information about the mathematical foundations, methods, and best practices for the incline package.

Core Mathematical Problem

The package addresses the problem of estimating the instantaneous rate of change (derivative) at a point \(t_0\) in a noisy time series:

\[y_i = f(t_i) + \epsilon_i\]

where \(f\) is the underlying smooth function and \(\epsilon_i\) is noise. We want to estimate \(f'(t_0)\) along with confidence intervals.

Available Methods and Capabilities

The incline package provides a comprehensive suite of trend estimation methods:

  1. Basic Methods

    • Naive (Central Differences): Simple finite differences for clean data

    • Savitzky-Golay Filtering: Local polynomial fitting with uniform sampling

    • Spline Interpolation: Smooth curve fitting with automatic time scaling

  2. Advanced Nonparametric Methods

    • LOESS/LOWESS: Locally weighted scatterplot smoothing with robust options

    • Local Polynomial Regression: Kernel-weighted polynomial fits

    • L1 Trend Filtering: Piecewise linear trends with changepoint detection

  3. Bayesian and State-Space Methods

    • Gaussian Process Regression: Full posterior distribution with principled uncertainty

    • Kalman Filtering: Local linear trend models with adaptive parameters

    • Structural Time Series: Seasonal decomposition with state-space modeling

  4. Multiscale Analysis

    • SiZer Maps: Significance of trends across multiple smoothing scales

    • Adaptive Methods: Time-varying parameters for non-stationary series

  5. Seasonal and Robust Methods

    • STL Decomposition: Seasonal-trend decomposition using Loess

    • Robust Statistics: Outlier-resistant trend ranking and aggregation

    • Bootstrap Confidence Intervals: Non-parametric uncertainty quantification

Key Features and Improvements

  1. Automatic Time Scaling

    All methods now handle:

    • DateTime indices with proper scaling

    • Irregular sampling (where applicable)

    • Custom time columns

    • Automatic detection of time units

    # Automatic time handling
    result = spline_trend(df)  # Uses datetime index
    result = gp_trend(df, time_column='timestamp')  # Custom time column
    
  2. Parameter Selection

    • Cross-validation: select_smoothing_parameter_cv()

    • Automatic selection: Built into GP and Kalman methods

    • Adaptive methods: adaptive_gp_trend(), adaptive_kalman_trend()

    # Automatic parameter selection
    best_s, cv_scores = select_smoothing_parameter_cv(df, method='spline')
    result = gp_trend(df)  # Auto-optimizes hyperparameters
    
  3. Comprehensive Uncertainty Quantification

    • Bootstrap confidence intervals: All basic methods

    • Bayesian posteriors: Gaussian Process methods

    • Kalman uncertainty: State-space models

    • Significance testing: SiZer analysis

    # Multiple uncertainty quantification approaches
    result = bootstrap_derivative_ci(df, n_bootstrap=200)
    result = gp_trend(df, confidence_level=0.95)
    result = kalman_trend(df)  # Natural uncertainty from Kalman filter
    
  4. Robust to Irregular Sampling

    Most methods handle irregular sampling:

    • Splines, LOESS, GP, Kalman: Native support

    • Savitzky-Golay: Requires regular sampling (automatically detected)

  5. Multiscale Significance Analysis

    # SiZer analysis across scales
    sizer = sizer_analysis(df, n_bandwidths=20)
    features = sizer.find_significant_features()
    
    # Combined trend + significance
    result = trend_with_sizer(df, trend_method='loess')
    

Mathematical Details

Savitzky-Golay Derivatives

The filter fits a polynomial of degree \(p\) using least squares over a window of size \(2m+1\):

\[\hat{f}(t) = \sum_{j=0}^{p} a_j t^j\]

The derivative is:

\[\hat{f}'(t_0) = \sum_{j=1}^{p} j \cdot a_j t_0^{j-1}\]

Key requirement: For \(k\)-th derivative, need \(p \geq k\) and window size \(\geq p + 1\).

Scaling: Derivative must be divided by \(\Delta t^k\) where \(\Delta t\) is the time step.

Spline Derivatives

Cubic splines minimize:

\[\sum_{i=1}^{n} w_i (y_i - f(x_i))^2 + \lambda \int [f''(x)]^2 dx\]

subject to \(\sum w_i (y_i - f(x_i))^2 \leq s\).

The derivative is obtained analytically from the spline coefficients.

Advantage: Handles irregular sampling naturally.

Challenge: Choosing \(s\) requires domain knowledge or cross-validation.

Naive Method (Central Differences)

\[f'(t_i) \approx \frac{y_{i+1} - y_{i-1}}{2\Delta t}\]

Pros: Simple, unbiased for linear trends

Cons: High variance, sensitive to noise, poor at boundaries

Autocorrelation and Serial Dependence

Time series typically have autocorrelated errors:

\[\epsilon_t = \rho \epsilon_{t-1} + \eta_t\]

This violates independence assumptions and means:

  1. Standard errors are underestimated

  2. Confidence intervals are too narrow

  3. Significance tests are invalid

Solution: Use block bootstrap or model the autocorrelation explicitly.

Best Practices

  1. Always specify time units

    # BAD: Assumes unit time
    result = spline_trend(df)
    
    # GOOD: Explicit time handling
    result = improved_spline_trend(df, time_column='date')
    
  2. Check sampling regularity

    time_diffs = df.index.to_series().diff()
    if time_diffs.std() / time_diffs.mean() > 0.1:
        print("Warning: Irregular sampling detected")
        # Use splines, not Savitzky-Golay
    
  3. Validate smoothing parameters

    # Use cross-validation
    best_s, cv_results = select_smoothing_parameter_cv(
        df, param_name='s', method='spline'
    )
    
  4. Quantify uncertainty

    # Get confidence intervals
    result = bootstrap_derivative_ci(
        df, method='spline', n_bootstrap=100
    )
    
    # Check if trend is significant
    significant = result['significant_trend']
    
  5. Handle seasonality

    For seasonal data, consider:

    • Pre-deseasonalizing with STL decomposition

    • Using longer smoothing windows (> seasonal period)

    • Fitting seasonal models explicitly

  6. Be cautious at boundaries

    # Mark unreliable edge estimates
    window = 15
    result['reliable'] = True
    result.iloc[:window//2, 'reliable'] = False
    result.iloc[-window//2:, 'reliable'] = False
    

Alternative Approaches

For more robust trend estimation, consider:

  1. Local polynomial regression (LOESS) - More flexible than Savitzky-Golay - Better edge handling - Available in statsmodels

  2. State-space models - Explicit modeling of trend component - Natural uncertainty quantification - Handles missing data

  3. Gaussian processes - Full posterior distribution for derivatives - Principled uncertainty quantification - Heavy computationally

  4. L1 trend filtering - Piecewise linear trends - Automatic changepoint detection - Robust to outliers

References

  1. Savitzky, A., & Golay, M. J. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical chemistry, 36(8), 1627-1639.

  2. De Boor, C. (1978). A practical guide to splines. Springer-Verlag.

  3. Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its applications. Chapman and Hall.

  4. Kim, S. J., Koh, K., Boyd, S., & Gorinevsky, D. (2009). ℓ1 trend filtering. SIAM review, 51(2), 339-360.