Examples

This page contains complete, realistic examples demonstrating how to use onlinerake in various scenarios.

Example 1: Correcting Gender Bias in Tech Survey

Online tech surveys often over-represent young males. Here’s how to correct this bias:

import numpy as np
from onlinerake import OnlineRakingSGD, Targets

# US population targets (approximate)
targets = Targets(
    age=0.52,      # 52% over 35 years old
    gender=0.51,   # 51% female
    education=0.35, # 35% college educated
    region=0.19    # 19% rural
)

# Initialize raker with higher learning rate for quick correction
raker = OnlineRakingSGD(targets, learning_rate=4.0)

# Simulate biased tech survey responses
np.random.seed(42)
n_responses = 500
raw_totals = {"age": 0, "gender": 0, "education": 0, "region": 0}

for i in range(n_responses):
    # Bias: 70% young males, 60% college educated
    age = 1 if np.random.random() < 0.3 else 0      # 30% older
    gender = 1 if np.random.random() < 0.35 else 0  # 35% female
    education = 1 if np.random.random() < 0.6 else 0 # 60% college
    region = 1 if np.random.random() < 0.15 else 0   # 15% rural

    obs = {"age": age, "gender": gender, "education": education, "region": region}
    raker.partial_fit(obs)

    # Track raw proportions
    for key in raw_totals:
        raw_totals[key] += obs[key]

# Compare results
raw_margins = {k: v/n_responses for k, v in raw_totals.items()}
weighted_margins = raker.margins

print("Results after", n_responses, "responses:")
print("Characteristic | Target | Raw    | Weighted")
print("-" * 40)
for char in ['gender', 'age', 'education', 'region']:
    target = targets.as_dict()[char]
    raw = raw_margins[char]
    weighted = weighted_margins[char]
    print(f"{char:<12} | {target:.3f} | {raw:.3f} | {weighted:.3f}")

print(f"\\nEffective Sample Size: {raker.effective_sample_size:.1f}")
print(f"Final Loss: {raker.loss:.6f}")

Expected Output:

Results after 500 responses:
Characteristic | Target | Raw    | Weighted
----------------------------------------
gender       | 0.510 | 0.344 | 0.491
age          | 0.520 | 0.330 | 0.491
education    | 0.350 | 0.602 | 0.378
region       | 0.190 | 0.134 | 0.167

Effective Sample Size: 294.1
Final Loss: 0.002512

Example 2: Real-time Election Polling

Handle streaming poll responses with changing demographics:

from onlinerake import OnlineRakingSGD, Targets

# 2024 US voter demographics
targets = Targets(
    age=0.48,      # 48% over 50 years old
    gender=0.53,   # 53% female voters
    education=0.32, # 32% college degree
    region=0.17    # 17% rural voters
)

raker = OnlineRakingSGD(targets, learning_rate=3.0)

# Simulate poll responses with time-varying bias
import numpy as np
np.random.seed(789)
n_polls = 1000

# Track evolution of margins
checkpoints = [200, 400, 600, 800, 1000]

for i in range(n_polls):
    # Demographics change over time as different groups respond
    time_factor = i / n_polls

    # Early: social media recruitment (younger)
    # Later: phone polling kicks in (older)
    p_older = 0.2 + 0.4 * time_factor
    age = 1 if np.random.random() < p_older else 0

    # Education bias decreases over time
    p_educated = 0.6 - 0.3 * time_factor
    education = 1 if np.random.random() < p_educated else 0

    # Other demographics relatively stable
    gender = 1 if np.random.random() < 0.52 else 0
    region = 1 if np.random.random() < 0.18 else 0

    obs = {"age": age, "gender": gender, "education": education, "region": region}
    raker.partial_fit(obs)

    # Print progress at checkpoints
    if (i + 1) in checkpoints:
        margins = raker.margins
        print(f"After {i+1:4d} responses: Age={margins['age']:.3f}, "
              f"Gender={margins['gender']:.3f}, Education={margins['education']:.3f}")

print(f"\\nFinal ESS: {raker.effective_sample_size:.1f} / {n_polls}")

Example 3: Comparing SGD vs MWU

Side-by-side comparison of both algorithms:

from onlinerake import OnlineRakingSGD, OnlineRakingMWU, Targets
import numpy as np

targets = Targets(age=0.45, gender=0.52, education=0.38, region=0.22)

# Different learning rates optimized for each method
sgd_raker = OnlineRakingSGD(targets, learning_rate=5.0)
mwu_raker = OnlineRakingMWU(targets, learning_rate=1.0)

# Simulate sudden demographic shift
np.random.seed(2024)
n_obs = 800

for i in range(n_obs):
    if i < n_obs // 2:
        # First half: younger, more educated
        age = 1 if np.random.random() < 0.25 else 0
        education = 1 if np.random.random() < 0.65 else 0
    else:
        # Second half: older, less educated
        age = 1 if np.random.random() < 0.70 else 0
        education = 1 if np.random.random() < 0.15 else 0

    gender = 1 if np.random.random() < 0.50 else 0
    region = 1 if np.random.random() < 0.20 else 0

    obs = {"age": age, "gender": gender, "education": education, "region": region}

    sgd_raker.partial_fit(obs)
    mwu_raker.partial_fit(obs)

# Compare final results
print("Final Results:")
print("Metric               | Target | SGD    | MWU")
print("-" * 45)

sgd_final = sgd_raker.margins
mwu_final = mwu_raker.margins

for char in ['age', 'gender', 'education', 'region']:
    target = targets.as_dict()[char]
    sgd_val = sgd_final[char]
    mwu_val = mwu_final[char]
    print(f"{char:<20} | {target:.3f} | {sgd_val:.3f} | {mwu_val:.3f}")

print("-" * 45)
print(f"Loss (squared error) |        | {sgd_raker.loss:.5f} | {mwu_raker.loss:.5f}")
print(f"Effective Sample Size|        | {sgd_raker.effective_sample_size:.1f} | {mwu_raker.effective_sample_size:.1f}")

Running the Examples

All examples are available in the repository as realistic_examples.py:

python realistic_examples.py

You can also run the simulation and benchmarking suite:

python examples/simulation.py

The simulation script provides various command-line options:

python examples/simulation.py --help
python examples/simulation.py --seeds 5 --n-obs 500

Available examples in the examples/ folder:

realistic_examples.py - Real-world usage scenarios
simulation.py - Algorithm benchmarking and performance evaluation
diagnostics_demo.py - Monitoring and convergence analysis tools