Changelog ========= All notable changes to this project will be documented in this file. The format is based on `Keep a Changelog `_, and this project adheres to `Semantic Versioning `_. [Unreleased] ------------ **Breaking Changes** - **Minimum Python version increased to 3.10** (from 3.8) - Modernized type hints using Python 3.10+ syntax (dict, list, | for union types) **Added** - **Enhanced Diagnostics & Monitoring**: - Gradient norm tracking for convergence analysis - Automatic convergence detection with configurable tolerance - Oscillation detection for non-converging scenarios - Enhanced weight distribution statistics (quartiles, outliers) - Verbose mode for debugging with progress indicators - Loss moving average calculation - New ``diagnostics_demo.py`` example showcasing monitoring features - **Major Performance Optimizations**: - **Capacity doubling for weights storage**: Eliminates O(n²) memory reallocations - **Optimized array conversions**: Moved outside gradient computation loops - **Configurable weight statistics**: Optional/sampled computation for expensive percentiles - **Overall speedup**: 10-100x improvement for large streams (n>1000) - Performance scales nearly linearly with data size - Comprehensive test suite with 21+ test cases - Realistic examples for common use cases - Complete documentation with Sphinx - CI/CD workflows for testing and publishing - Code formatting and linting checks **Changed** - Type hints modernized to use Python 3.10+ built-in types - Removed ``from __future__ import annotations`` (no longer needed) - CI/CD now tests Python 3.10, 3.11, 3.12, and 3.13 (dropped 3.8, 3.9) - Enhanced history tracking with comprehensive diagnostic metrics - **Internal data structures**: Weights array now uses capacity doubling for O(log n) amortized growth - **Weight statistics computation**: Now configurable (always, never, or sampled) for performance **Fixed** - **Critical Numerical Stability Issues**: - MWU algorithm now clips exponential arguments to prevent overflow/underflow - Convergence detection properly handles near-zero loss cases - Improved robustness with extreme learning rates and gradients - Import errors for Optional and Any types in simulation module - Improved docstring formatting and clarity - Flake8 linting issues with whitespace in slice notation [0.1.1] - 2024-XX-XX -------------------- **Added** - Initial release of onlinerake package - SGD-based streaming raking algorithm (OnlineRakingSGD) - MWU-based streaming raking algorithm (OnlineRakingMWU) - Targets dataclass for population margins - Simulation module for benchmarking algorithms - Basic README with usage examples **Features** - Real-time weight calibration for streaming survey data - scikit-learn style partial_fit API - Support for binary demographic indicators (age, gender, education, region) - Effective sample size and loss monitoring - Weight clipping to prevent numerical issues - Comprehensive margin tracking and reporting **Dependencies** - numpy >= 1.21 - pandas >= 1.3 - Python >= 3.10 [0.1.0] - Initial Development ----------------------------- **Added** - Core algorithm implementations - Basic project structure - Initial documentation