Examples ======== This page contains complete, realistic examples demonstrating how to use ``onlinerake`` in various scenarios. Example 1: Correcting Gender Bias in Tech Survey ------------------------------------------------ Online tech surveys often over-represent young males. Here's how to correct this bias: .. code-block:: python import numpy as np from onlinerake import OnlineRakingSGD, Targets # US population targets (approximate) targets = Targets( age=0.52, # 52% over 35 years old gender=0.51, # 51% female education=0.35, # 35% college educated region=0.19 # 19% rural ) # Initialize raker with higher learning rate for quick correction raker = OnlineRakingSGD(targets, learning_rate=4.0) # Simulate biased tech survey responses np.random.seed(42) n_responses = 500 raw_totals = {"age": 0, "gender": 0, "education": 0, "region": 0} for i in range(n_responses): # Bias: 70% young males, 60% college educated age = 1 if np.random.random() < 0.3 else 0 # 30% older gender = 1 if np.random.random() < 0.35 else 0 # 35% female education = 1 if np.random.random() < 0.6 else 0 # 60% college region = 1 if np.random.random() < 0.15 else 0 # 15% rural obs = {"age": age, "gender": gender, "education": education, "region": region} raker.partial_fit(obs) # Track raw proportions for key in raw_totals: raw_totals[key] += obs[key] # Compare results raw_margins = {k: v/n_responses for k, v in raw_totals.items()} weighted_margins = raker.margins print("Results after", n_responses, "responses:") print("Characteristic | Target | Raw | Weighted") print("-" * 40) for char in ['gender', 'age', 'education', 'region']: target = targets.as_dict()[char] raw = raw_margins[char] weighted = weighted_margins[char] print(f"{char:<12} | {target:.3f} | {raw:.3f} | {weighted:.3f}") print(f"\\nEffective Sample Size: {raker.effective_sample_size:.1f}") print(f"Final Loss: {raker.loss:.6f}") **Expected Output:** .. code-block:: text Results after 500 responses: Characteristic | Target | Raw | Weighted ---------------------------------------- gender | 0.510 | 0.344 | 0.491 age | 0.520 | 0.330 | 0.491 education | 0.350 | 0.602 | 0.378 region | 0.190 | 0.134 | 0.167 Effective Sample Size: 294.1 Final Loss: 0.002512 Example 2: Real-time Election Polling ------------------------------------- Handle streaming poll responses with changing demographics: .. code-block:: python from onlinerake import OnlineRakingSGD, Targets # 2024 US voter demographics targets = Targets( age=0.48, # 48% over 50 years old gender=0.53, # 53% female voters education=0.32, # 32% college degree region=0.17 # 17% rural voters ) raker = OnlineRakingSGD(targets, learning_rate=3.0) # Simulate poll responses with time-varying bias import numpy as np np.random.seed(789) n_polls = 1000 # Track evolution of margins checkpoints = [200, 400, 600, 800, 1000] for i in range(n_polls): # Demographics change over time as different groups respond time_factor = i / n_polls # Early: social media recruitment (younger) # Later: phone polling kicks in (older) p_older = 0.2 + 0.4 * time_factor age = 1 if np.random.random() < p_older else 0 # Education bias decreases over time p_educated = 0.6 - 0.3 * time_factor education = 1 if np.random.random() < p_educated else 0 # Other demographics relatively stable gender = 1 if np.random.random() < 0.52 else 0 region = 1 if np.random.random() < 0.18 else 0 obs = {"age": age, "gender": gender, "education": education, "region": region} raker.partial_fit(obs) # Print progress at checkpoints if (i + 1) in checkpoints: margins = raker.margins print(f"After {i+1:4d} responses: Age={margins['age']:.3f}, " f"Gender={margins['gender']:.3f}, Education={margins['education']:.3f}") print(f"\\nFinal ESS: {raker.effective_sample_size:.1f} / {n_polls}") Example 3: Comparing SGD vs MWU ------------------------------- Side-by-side comparison of both algorithms: .. code-block:: python from onlinerake import OnlineRakingSGD, OnlineRakingMWU, Targets import numpy as np targets = Targets(age=0.45, gender=0.52, education=0.38, region=0.22) # Different learning rates optimized for each method sgd_raker = OnlineRakingSGD(targets, learning_rate=5.0) mwu_raker = OnlineRakingMWU(targets, learning_rate=1.0) # Simulate sudden demographic shift np.random.seed(2024) n_obs = 800 for i in range(n_obs): if i < n_obs // 2: # First half: younger, more educated age = 1 if np.random.random() < 0.25 else 0 education = 1 if np.random.random() < 0.65 else 0 else: # Second half: older, less educated age = 1 if np.random.random() < 0.70 else 0 education = 1 if np.random.random() < 0.15 else 0 gender = 1 if np.random.random() < 0.50 else 0 region = 1 if np.random.random() < 0.20 else 0 obs = {"age": age, "gender": gender, "education": education, "region": region} sgd_raker.partial_fit(obs) mwu_raker.partial_fit(obs) # Compare final results print("Final Results:") print("Metric | Target | SGD | MWU") print("-" * 45) sgd_final = sgd_raker.margins mwu_final = mwu_raker.margins for char in ['age', 'gender', 'education', 'region']: target = targets.as_dict()[char] sgd_val = sgd_final[char] mwu_val = mwu_final[char] print(f"{char:<20} | {target:.3f} | {sgd_val:.3f} | {mwu_val:.3f}") print("-" * 45) print(f"Loss (squared error) | | {sgd_raker.loss:.5f} | {mwu_raker.loss:.5f}") print(f"Effective Sample Size| | {sgd_raker.effective_sample_size:.1f} | {mwu_raker.effective_sample_size:.1f}") Running the Examples ------------------- All examples are available in the repository as ``realistic_examples.py``: .. code-block:: bash python realistic_examples.py You can also run the simulation and benchmarking suite: .. code-block:: bash python examples/simulation.py The simulation script provides various command-line options: .. code-block:: bash python examples/simulation.py --help python examples/simulation.py --seeds 5 --n-obs 500 Available examples in the ``examples/`` folder: - ``realistic_examples.py`` - Real-world usage scenarios - ``simulation.py`` - Algorithm benchmarking and performance evaluation - ``diagnostics_demo.py`` - Monitoring and convergence analysis tools