Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Breaking Changes - Minimum Python version increased to 3.10 (from 3.8) - Modernized type hints using Python 3.10+ syntax (dict, list, | for union types)
Added
Enhanced Diagnostics & Monitoring:
Gradient norm tracking for convergence analysis
Automatic convergence detection with configurable tolerance
Oscillation detection for non-converging scenarios
Enhanced weight distribution statistics (quartiles, outliers)
Verbose mode for debugging with progress indicators
Loss moving average calculation
New
diagnostics_demo.py
example showcasing monitoring features
Major Performance Optimizations:
Capacity doubling for weights storage: Eliminates O(n²) memory reallocations
Optimized array conversions: Moved outside gradient computation loops
Configurable weight statistics: Optional/sampled computation for expensive percentiles
Overall speedup: 10-100x improvement for large streams (n>1000)
Performance scales nearly linearly with data size
Comprehensive test suite with 21+ test cases
Realistic examples for common use cases
Complete documentation with Sphinx
CI/CD workflows for testing and publishing
Code formatting and linting checks
Changed
- Type hints modernized to use Python 3.10+ built-in types
- Removed from __future__ import annotations
(no longer needed)
- CI/CD now tests Python 3.10, 3.11, 3.12, and 3.13 (dropped 3.8, 3.9)
- Enhanced history tracking with comprehensive diagnostic metrics
- Internal data structures: Weights array now uses capacity doubling for O(log n) amortized growth
- Weight statistics computation: Now configurable (always, never, or sampled) for performance
Fixed
Critical Numerical Stability Issues:
MWU algorithm now clips exponential arguments to prevent overflow/underflow
Convergence detection properly handles near-zero loss cases
Improved robustness with extreme learning rates and gradients
Import errors for Optional and Any types in simulation module
Improved docstring formatting and clarity
Flake8 linting issues with whitespace in slice notation
[0.1.1] - 2024-XX-XX
Added - Initial release of onlinerake package - SGD-based streaming raking algorithm (OnlineRakingSGD) - MWU-based streaming raking algorithm (OnlineRakingMWU) - Targets dataclass for population margins - Simulation module for benchmarking algorithms - Basic README with usage examples
Features - Real-time weight calibration for streaming survey data - scikit-learn style partial_fit API - Support for binary demographic indicators (age, gender, education, region) - Effective sample size and loss monitoring - Weight clipping to prevent numerical issues - Comprehensive margin tracking and reporting
Dependencies - numpy >= 1.21 - pandas >= 1.3 - Python >= 3.10
[0.1.0] - Initial Development
Added - Core algorithm implementations - Basic project structure - Initial documentation