Mathematical Foundations and Methods¶

This document provides comprehensive information about the mathematical foundations, methods, and best practices for the incline package.

Core Mathematical Problem¶

The package addresses the problem of estimating the instantaneous rate of change (derivative) at a point \(t_0\) in a noisy time series:

\[y_i = f(t_i) + \epsilon_i\]

where \(f\) is the underlying smooth function and \(\epsilon_i\) is noise. We want to estimate \(f'(t_0)\) along with confidence intervals.

Available Methods and Capabilities¶

The incline package provides a comprehensive suite of trend estimation methods:

1. Basic Methods¶

Naive (Central Differences): Simple finite differences for clean data
Savitzky-Golay Filtering: Local polynomial fitting with uniform sampling
Spline Interpolation: Smooth curve fitting with automatic time scaling

2. Advanced Nonparametric Methods¶

LOESS/LOWESS: Locally weighted scatterplot smoothing with robust options
Local Polynomial Regression: Kernel-weighted polynomial fits
L1 Trend Filtering: Piecewise linear trends with changepoint detection

3. Bayesian and State-Space Methods¶

Gaussian Process Regression: Full posterior distribution with principled uncertainty
Kalman Filtering: Local linear trend models with adaptive parameters
Structural Time Series: Seasonal decomposition with state-space modeling

4. Multiscale Analysis¶

SiZer Maps: Significance of trends across multiple smoothing scales
Adaptive Methods: Time-varying parameters for non-stationary series

5. Seasonal and Robust Methods¶

STL Decomposition: Seasonal-trend decomposition using Loess
Robust Statistics: Outlier-resistant trend ranking and aggregation
Bootstrap Confidence Intervals: Non-parametric uncertainty quantification

Key Features and Improvements¶

1. Automatic Time Scaling ✅¶

All methods now handle:

DateTime indices with proper scaling
Irregular sampling (where applicable)
Custom time columns
Automatic detection of time units

# Automatic time handling
result = spline_trend(df)  # Uses datetime index
result = gp_trend(df, time_column='timestamp')  # Custom time column

2. Parameter Selection ✅¶

Cross-validation: select_smoothing_parameter_cv()
Automatic selection: Built into GP and Kalman methods
Adaptive methods: adaptive_gp_trend(), adaptive_kalman_trend()

# Automatic parameter selection
best_s, cv_scores = select_smoothing_parameter_cv(df, method='spline')
result = gp_trend(df)  # Auto-optimizes hyperparameters

3. Comprehensive Uncertainty Quantification ✅¶

Bootstrap confidence intervals: All basic methods
Bayesian posteriors: Gaussian Process methods
Kalman uncertainty: State-space models
Significance testing: SiZer analysis

# Multiple uncertainty quantification approaches
result = bootstrap_derivative_ci(df, n_bootstrap=200)
result = gp_trend(df, confidence_level=0.95)
result = kalman_trend(df)  # Natural uncertainty from Kalman filter

4. Robust to Irregular Sampling ✅¶

Most methods handle irregular sampling:

Splines, LOESS, GP, Kalman: Native support
Savitzky-Golay: Requires regular sampling (automatically detected)

5. Multiscale Significance Analysis ✅¶

# SiZer analysis across scales
sizer = sizer_analysis(df, n_bandwidths=20)
features = sizer.find_significant_features()

# Combined trend + significance
result = trend_with_sizer(df, trend_method='loess')

Mathematical Details¶

Savitzky-Golay Derivatives¶

The filter fits a polynomial of degree \(p\) using least squares over a window of size \(2m+1\):

\[\hat{f}(t) = \sum_{j=0}^{p} a_j t^j\]

The derivative is:

\[\hat{f}'(t_0) = \sum_{j=1}^{p} j \cdot a_j t_0^{j-1}\]

Key requirement: For \(k\)-th derivative, need \(p \geq k\) and window size \(\geq p + 1\).

Scaling: Derivative must be divided by \(\Delta t^k\) where \(\Delta t\) is the time step.

Spline Derivatives¶

Cubic splines minimize:

\[\sum_{i=1}^{n} w_i (y_i - f(x_i))^2 + \lambda \int [f''(x)]^2 dx\]

subject to \(\sum w_i (y_i - f(x_i))^2 \leq s\).

The derivative is obtained analytically from the spline coefficients.

Advantage: Handles irregular sampling naturally.

Challenge: Choosing \(s\) requires domain knowledge or cross-validation.

Naive Method (Central Differences)¶

\[f'(t_i) \approx \frac{y_{i+1} - y_{i-1}}{2\Delta t}\]

Pros: Simple, unbiased for linear trends

Cons: High variance, sensitive to noise, poor at boundaries

Autocorrelation and Serial Dependence¶

Time series typically have autocorrelated errors:

\[\epsilon_t = \rho \epsilon_{t-1} + \eta_t\]

This violates independence assumptions and means:

Standard errors are underestimated
Confidence intervals are too narrow
Significance tests are invalid

Solution: Use block bootstrap or model the autocorrelation explicitly.

Best Practices¶

1. Always specify time units¶

# BAD: Assumes unit time
result = spline_trend(df)

# GOOD: Explicit time handling
result = improved_spline_trend(df, time_column='date')

2. Check sampling regularity¶

time_diffs = df.index.to_series().diff()
if time_diffs.std() / time_diffs.mean() > 0.1:
    print("Warning: Irregular sampling detected")
    # Use splines, not Savitzky-Golay

3. Validate smoothing parameters¶

# Use cross-validation
best_s, cv_results = select_smoothing_parameter_cv(
    df, param_name='s', method='spline'
)

4. Quantify uncertainty¶

# Get confidence intervals
result = bootstrap_derivative_ci(
    df, method='spline', n_bootstrap=100
)

# Check if trend is significant
significant = result['significant_trend']

5. Handle seasonality¶

For seasonal data, consider:

Pre-deseasonalizing with STL decomposition
Using longer smoothing windows (> seasonal period)
Fitting seasonal models explicitly

6. Be cautious at boundaries¶

# Mark unreliable edge estimates
window = 15
result['reliable'] = True
result.iloc[:window//2, 'reliable'] = False
result.iloc[-window//2:, 'reliable'] = False

Alternative Approaches¶

For more robust trend estimation, consider:

Local polynomial regression (LOESS)
- More flexible than Savitzky-Golay
- Better edge handling
- Available in statsmodels
State-space models
- Explicit modeling of trend component
- Natural uncertainty quantification
- Handles missing data
Gaussian processes
- Full posterior distribution for derivatives
- Principled uncertainty quantification
- Heavy computationally
L1 trend filtering
- Piecewise linear trends
- Automatic changepoint detection
- Robust to outliers

References¶

Savitzky, A., & Golay, M. J. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical chemistry, 36(8), 1627-1639.
De Boor, C. (1978). A practical guide to splines. Springer-Verlag.
Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its applications. Chapman and Hall.
Kim, S. J., Koh, K., Boyd, S., & Gorinevsky, D. (2009). ℓ1 trend filtering. SIAM review, 51(2), 339-360.