Quick Start Guide
Basic Usage
The onlinerake
package provides streaming survey raking with two algorithms:
SGD Raking - Stochastic gradient descent with smooth updates
MWU Raking - Multiplicative weights with exponential updates
Both algorithms follow the same API pattern:
from onlinerake import OnlineRakingSGD, OnlineRakingMWU, Targets
# Define target population proportions
targets = Targets(
age=0.52, # 52% over 35 years old
gender=0.51, # 51% female
education=0.35, # 35% college educated
region=0.19 # 19% rural
)
# Initialize raker
raker = OnlineRakingSGD(targets, learning_rate=3.0)
# Process observations one at a time
observations = [
{"age": 1, "gender": 0, "education": 1, "region": 0},
{"age": 0, "gender": 1, "education": 0, "region": 1},
# ... more observations
]
for obs in observations:
raker.partial_fit(obs)
# Inspect results
print(f"Weighted margins: {raker.margins}")
print(f"Effective sample size: {raker.effective_sample_size}")
print(f"Loss: {raker.loss}")
Key Concepts
- Targets
Population proportions you want to match. Each field represents the proportion with indicator value 1 (e.g., female=1, male=0).
- Observations
Binary demographic indicators, provided as dictionaries or objects with
age
,gender
,education
,region
attributes.- Margins
Current weighted proportions after processing all observations so far.
- Effective Sample Size
Measure of how “concentrated” the weights are. Higher is better.
- Loss
Squared error between current margins and targets. Lower is better.
Algorithm Choice
Use SGD when: - You want the most accurate margin tracking - Smooth weight trajectories are important - You can tune learning rates appropriately
Use MWU when: - You prefer multiplicative (percentage-based) adjustments - You want weight distributions similar to classic IPF - You’re starting from unequal base weights
Parameter Tuning
Learning Rate - SGD: Start with 3.0-5.0, increase if convergence is slow - MWU: Start with 1.0-1.5, decrease if weights become unstable
Weight Bounds
- min_weight
: Prevents weights from collapsing (default: 1e-3)
- max_weight
: Prevents runaway weights (default: 100.0)
Update Steps
- n_sgd_steps
(SGD): More steps = smoother convergence (default: 3)
- n_steps
(MWU): More steps = more aggressive updates (default: 3)
Next Steps
See Tutorials for detailed examples
Check API Reference for complete parameter descriptions
Try Examples for realistic use cases