onlinerake: Streaming Survey Raking

Modern online surveys and passive data collection streams generate responses one record at a time. Classic weighting methods such as iterative proportional fitting (IPF, or “raking”) and calibration weighting are inherently batch procedures: they reprocess the entire dataset whenever a new case arrives. The onlinerake package provides incremental, per‑observation updates to survey weights so that weighted margins track known population totals in real time.

Key Features

  • Real-time weight calibration for streaming survey data

  • Two complementary algorithms: SGD and multiplicative weights update (MWU)

  • scikit-learn style API with partial_fit method

  • Minimal dependencies: only numpy and pandas

  • Comprehensive testing with realistic examples

Quick Start

from onlinerake import OnlineRakingSGD, Targets

# Define target population margins
targets = Targets(age=0.5, gender=0.5, education=0.4, region=0.3)

# Create raker
raker = OnlineRakingSGD(targets, learning_rate=5.0)

# Process streaming observations
for obs in stream_of_observations:
    raker.partial_fit(obs)
    print(f"Current margins: {raker.margins}")

Algorithms

SGD Raking

Stochastic gradient descent on squared-error loss over margins. Produces smooth weight trajectories and maintains high effective sample size.

MWU Raking

Multiplicative weights update inspired by mirror descent under KL divergence. Yields weight distributions similar to classic IPF but can produce heavier tails.

Indices and tables