{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Plateau Diagnostics Demo\n", "\n", "This notebook demonstrates the use of isotonic regression plateau diagnostics to distinguish between noise-based flattening (good) and limited-data flattening (bad).\n", "\n", "## Overview\n", "\n", "When isotonic regression creates flat regions (plateaus) in calibration curves, it could be for two reasons:\n", "\n", "1. **Noise-based flattening (good)**: Adjacent scores truly have similar risks, and pooling reduces variance without losing meaningful resolution.\n", "2. **Limited-data flattening (bad)**: Adjacent scores have different risks, but the calibration sample is too small to detect the difference.\n", "\n", "This package provides comprehensive diagnostics to help distinguish between these cases." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from sklearn.isotonic import IsotonicRegression\n", "from sklearn.model_selection import train_test_split\n", "\n", "# Import calibre diagnostics (updated for v0.4.1)\n", "from calibre import (\n", " IsotonicCalibrator,\n", " NearlyIsotonicCalibrator,\n", " RegularizedIsotonicCalibrator,\n", " run_plateau_diagnostics,\n", ")\n", "\n", "# Import metrics (updated for v0.4.1)\n", "from calibre.metrics import (\n", " calibration_diversity_index,\n", " plateau_quality_score,\n", " progressive_sampling_diversity,\n", " tie_preservation_score,\n", ")\n", "\n", "# Import visualization (optional)\n", "try:\n", " from calibre.visualization import (\n", " plot_calibration_comparison,\n", " plot_plateau_diagnostics,\n", " plot_progressive_sampling,\n", " )\n", "\n", " HAS_VIZ = True\n", "except ImportError:\n", " print(\n", " \"Visualization module requires matplotlib. Install with: pip install matplotlib\"\n", " )\n", " HAS_VIZ = False\n", "\n", "np.random.seed(42)\n", "print(\"Calibre plateau diagnostics demo loaded successfully!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Generate Synthetic Data\n", "\n", "Let's create two scenarios:\n", "- **Scenario A**: Data with genuine flat regions (noise-based flattening)\n", "- **Scenario B**: Data with smooth trends but small sample size (limited-data flattening)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_genuine_plateau_data(n=200, noise_level=0.05):\n", " \"\"\"Create data with genuine flat regions.\"\"\"\n", " X = np.sort(np.random.uniform(0, 1, n))\n", "\n", " # Create true probabilities with intentional flat regions\n", " y_true = np.zeros(n)\n", " y_true[: n // 4] = 0.1 # Flat low region\n", " y_true[n // 4 : n // 2] = np.linspace(0.1, 0.4, n // 4) # Rising\n", " y_true[n // 2 : 3 * n // 4] = 0.4 # Flat middle region\n", " y_true[3 * n // 4 :] = np.linspace(0.4, 0.8, n // 4) # Rising\n", "\n", " # Add small amount of noise\n", " y_true += np.random.normal(0, noise_level, n)\n", " y_true = np.clip(y_true, 0, 1)\n", "\n", " # Generate binary outcomes\n", " y_binary = np.random.binomial(1, y_true)\n", "\n", " return X, y_binary, y_true\n", "\n", "\n", "def create_smooth_small_data(n=50):\n", " \"\"\"Create smooth data with small sample size.\"\"\"\n", " X = np.sort(np.random.uniform(0, 1, n))\n", "\n", " # Smooth sigmoid-like curve\n", " y_true = 1 / (1 + np.exp(-8 * (X - 0.5)))\n", "\n", " # Generate binary outcomes\n", " y_binary = np.random.binomial(1, y_true)\n", "\n", " return X, y_binary, y_true\n", "\n", "\n", "# Generate both scenarios\n", "X_genuine, y_genuine, y_true_genuine = create_genuine_plateau_data()\n", "X_small, y_small, y_true_small = create_smooth_small_data()\n", "\n", "print(f\"Genuine plateau data: {len(X_genuine)} samples\")\n", "print(f\"Small sample data: {len(X_small)} samples\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Basic Isotonic Regression with Diagnostics\n", "\n", "Let's start with the simple wrapper that automatically runs diagnostics:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Scenario A: Genuine plateaus (updated for v0.4.1)\n", "print(\"=== Scenario A: Genuine Plateau Data ===\")\n", "cal_genuine = IsotonicCalibrator(enable_diagnostics=True)\n", "cal_genuine.fit(X_genuine, y_genuine)\n", "\n", "print(\"\\nDiagnostic Summary:\")\n", "if cal_genuine.has_diagnostics():\n", " print(cal_genuine.diagnostic_summary())\n", "else:\n", " print(\"No diagnostics available\")\n", "\n", "# Get calibrated predictions\n", "y_cal_genuine = cal_genuine.transform(X_genuine)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Scenario B: Small sample data (updated for v0.4.1)\n", "print(\"=== Scenario B: Small Sample Data ===\")\n", "cal_small = IsotonicCalibrator(enable_diagnostics=True)\n", "cal_small.fit(X_small, y_small)\n", "\n", "print(\"\\nDiagnostic Summary:\")\n", "if cal_small.has_diagnostics():\n", " print(cal_small.diagnostic_summary())\n", "else:\n", " print(\"No diagnostics available\")\n", "\n", "y_cal_small = cal_small.transform(X_small)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Advanced Diagnostic Analysis\n", "\n", "For more detailed analysis, we can use the `IsotonicDiagnostics` class directly:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "# Split data for more thorough analysis (train/test)\nX_train, X_test, y_train, y_test = train_test_split(\n X_genuine, y_genuine, test_size=0.3, random_state=42\n)\n\n# Run comprehensive diagnostics using standalone function (updated for v0.4.1)\n# First fit calibrator and get predictions\ncal = IsotonicCalibrator()\ncal.fit(X_train, y_train)\ny_cal_train = cal.transform(X_train)\n\n# Run plateau diagnostics on the calibrated results\nresults = run_plateau_diagnostics(X_train, y_train, y_cal_train)\n\nprint(\"Detailed diagnostic results:\")\nprint(f\"Detected {results['n_plateaus']} plateau regions\")\n\nif results['n_plateaus'] > 0:\n print(\"\\nPlateau details:\")\n for i, plateau in enumerate(results['plateaus']):\n print(f\" Plateau {i + 1}:\")\n if 'x_range' in plateau:\n print(f\" X range: [{plateau['x_range'][0]:.3f}, {plateau['x_range'][1]:.3f}]\")\n if 'value' in plateau:\n print(f\" Value: {plateau['value']:.3f}\")\n if 'n_samples' in plateau:\n print(f\" Samples: {plateau['n_samples']}\")\n if 'sample_density' in plateau:\n print(f\" Density: {plateau['sample_density']}\")\n \nif results['warnings']:\n print(\"\\nWarnings:\")\n for warning in results['warnings']:\n print(f\" ⚠️ {warning}\")" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Diagnostic Metrics\n", "\n", "Let's explore the specific diagnostic metrics:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare original vs calibrated predictions\n", "iso_basic = IsotonicRegression()\n", "iso_basic.fit(X_genuine, y_genuine)\n", "y_cal_basic = iso_basic.transform(X_genuine)\n", "\n", "# Tie preservation score\n", "tie_score = tie_preservation_score(X_genuine, y_cal_basic)\n", "print(f\"Tie preservation score: {tie_score:.3f}\")\n", "\n", "# Plateau quality score\n", "quality_score = plateau_quality_score(X_genuine, y_genuine, y_cal_basic)\n", "print(f\"Plateau quality score: {quality_score:.3f}\")\n", "\n", "# Calibration diversity\n", "diversity_orig = calibration_diversity_index(X_genuine)\n", "diversity_cal = calibration_diversity_index(y_cal_basic)\n", "diversity_relative = calibration_diversity_index(y_cal_basic, diversity_orig)\n", "\n", "print(f\"Original diversity: {diversity_orig:.3f}\")\n", "print(f\"Calibrated diversity: {diversity_cal:.3f}\")\n", "print(f\"Relative diversity: {diversity_relative:.3f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Progressive Sampling Analysis\n", "\n", "This helps distinguish limited-data flattening by showing how diversity changes with sample size:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Progressive sampling analysis\n", "sample_sizes, diversities = progressive_sampling_diversity(\n", " X_genuine, y_genuine, sample_sizes=[50, 100, 150, 200], n_trials=10, random_state=42\n", ")\n", "\n", "print(\"Progressive sampling results:\")\n", "for size, div in zip(sample_sizes, diversities):\n", " print(f\" Sample size {size}: diversity = {div:.3f}\")\n", "\n", "# Interpret trend\n", "slope = (diversities[-1] - diversities[0]) / (sample_sizes[-1] - sample_sizes[0])\n", "if slope > 0.001:\n", " print(\n", " \"\\nInterpretation: Increasing diversity suggests potential limited-data flattening\"\n", " )\n", "elif slope < -0.001:\n", " print(\"\\nInterpretation: Decreasing diversity (unusual pattern)\")\n", "else:\n", " print(\"\\nInterpretation: Stable diversity suggests genuine flatness\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Comparison with Alternative Methods\n", "\n", "Let's compare strict isotonic regression with softer alternatives:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compare different calibration methods (updated for v0.4.1)\n", "from calibre.metrics import mean_calibration_error\n", "\n", "# Fit different calibrators\n", "iso_strict = IsotonicCalibrator()\n", "iso_nearly = NearlyIsotonicCalibrator(lam=1.0)\n", "iso_reg = RegularizedIsotonicCalibrator(alpha=0.1)\n", "\n", "calibrators = {\n", " \"Strict Isotonic\": iso_strict.fit(X_genuine, y_genuine),\n", " \"Nearly Isotonic\": iso_nearly.fit(X_genuine, y_genuine),\n", " \"Regularized\": iso_reg.fit(X_genuine, y_genuine),\n", "}\n", "\n", "# Compare diversity and calibration error\n", "print(\"Method comparison:\")\n", "for name, cal in calibrators.items():\n", " try:\n", " y_pred = cal.transform(X_genuine)\n", " diversity = calibration_diversity_index(y_pred)\n", " error = mean_calibration_error(y_genuine, y_pred)\n", " n_unique = len(np.unique(y_pred))\n", "\n", " print(f\" {name}:\")\n", " print(f\" Diversity: {diversity:.3f}\")\n", " print(f\" Calibration error: {error:.3f}\")\n", " print(f\" Unique values: {n_unique}/{len(y_pred)}\")\n", " except Exception as e:\n", " print(f\" {name}: Error - {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Visualization (if matplotlib available)\n", "\n", "Let's create some visualizations to better understand the diagnostics:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "# Visualization\ntry:\n if HAS_VIZ:\n # Try to plot comprehensive diagnostic results\n # Note: visualization module may have compatibility issues with current data format\n y_cal_for_plot = iso_strict.transform(X_genuine)\n \n # Skip plot_plateau_diagnostics as it expects different data format\n print(\"Visualization of plateau diagnostics (text version):\")\n if results['n_plateaus'] > 0:\n for i, plateau in enumerate(results['plateaus']):\n print(f\" Plateau {i+1}: X=[{plateau['x_range'][0]:.3f}, {plateau['x_range'][1]:.3f}], \"\n f\"Y={plateau['value']:.3f}, Samples={plateau['n_samples']}, \"\n f\"Density={plateau['sample_density']}\")\n \n # Try progressive sampling plot if available\n try:\n fig2 = plot_progressive_sampling(sample_sizes, diversities)\n plt.show()\n except Exception as e:\n print(f\"Progressive sampling plot not available: {e}\")\n \n # Try calibration comparison plot if available \n try:\n fig3 = plot_calibration_comparison(X_genuine, y_genuine, calibrators)\n plt.show()\n except Exception as e:\n print(f\"Calibration comparison plot not available: {e}\")\n else:\n print(\"Visualization not available. Install matplotlib to see plots.\")\nexcept Exception as e:\n print(f\"Visualization error: {e}\")\n \n# Always provide text-based visualization as fallback\nprint(\"\\nSimple calibration curve comparison:\")\nX_sample = X_genuine[::10] # Sample every 10th point\n\nfor name, cal in calibrators.items():\n try:\n y_sample = cal.transform(X_sample)\n print(f\"\\n{name}:\")\n for i in range(0, min(5, len(X_sample)), 1):\n print(f\" X={X_sample[i]:.2f} -> Y={y_sample[i]:.3f}\")\n except Exception as e:\n print(f\"{name}: Error - {e}\")" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Practical Decision Framework\n", "\n", "Based on the diagnostic results, here's how to make practical decisions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": "def recommend_calibration_method(diagnostic_results, diversity_trend_slope=None):\n \"\"\"Provide calibration method recommendations based on diagnostics.\"\"\"\n\n if diagnostic_results['n_plateaus'] == 0:\n return \"Standard isotonic regression (no plateaus detected)\"\n\n # Count plateau types based on sample density\n concerning = 0\n total = diagnostic_results['n_plateaus']\n\n for plateau in diagnostic_results['plateaus']:\n if 'sample_density' in plateau:\n if plateau['sample_density'] in ['sparse', 'very_sparse']:\n concerning += 1\n\n recommendations = []\n\n if concerning == 0:\n recommendations.append(\n \"✅ Standard isotonic regression (all plateaus appear genuine)\"\n )\n elif concerning / total > 0.5:\n recommendations.append(\"⚠️ Consider softer calibration methods:\")\n recommendations.append(\n \" - Nearly isotonic regression (allows small violations)\"\n )\n recommendations.append(\" - Regularized isotonic regression\")\n recommendations.append(\" - Spline calibration\")\n else:\n recommendations.append(\"🤔 Mixed evidence - consider:\")\n recommendations.append(\" - Cross-validation between strict and soft methods\")\n recommendations.append(\" - Collecting more calibration data if possible\")\n\n # Additional recommendations based on diversity trend\n if diversity_trend_slope is not None:\n if diversity_trend_slope > 0.001:\n recommendations.append(\n \"📈 Increasing diversity with sample size suggests limited-data flattening\"\n )\n recommendations.append(\n \" -> Strongly recommend collecting more data or using softer methods\"\n )\n elif diversity_trend_slope < -0.001:\n recommendations.append(\n \"📉 Unusual decreasing diversity pattern - investigate data quality\"\n )\n\n return \"\\n\".join(recommendations)\n\n\n# Get recommendations for our data (updated for v0.4.1)\nslope = (diversities[-1] - diversities[0]) / (sample_sizes[-1] - sample_sizes[0])\nrecommendations = recommend_calibration_method(results, slope)\n\nprint(\"=== CALIBRATION METHOD RECOMMENDATIONS ===\")\nprint(recommendations)" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9. Summary and Best Practices\n", "\n", "### Key Diagnostic Indicators:\n", "\n", "1. **Tie Stability (Bootstrap)**: \n", " - High (>0.7): Suggests genuine flatness\n", " - Low (<0.3): Suggests limited-data flattening\n", "\n", "2. **Conditional AUC Among Tied Pairs**:\n", " - Close to 0.5: Supports noise-based flattening\n", " - Much above 0.5: Suggests limited-data flattening\n", "\n", "3. **Progressive Sampling Diversity**:\n", " - Stable: Supports genuine flatness\n", " - Increasing: Suggests limited-data flattening\n", "\n", "4. **Minimum Detectable Difference (MDD)**:\n", " - Compare with domain knowledge of plausible effect sizes\n", "\n", "### Best Practices:\n", "\n", "1. **Always run diagnostics** when using isotonic regression\n", "2. **Use multiple diagnostic criteria** together, not individually\n", "3. **Consider domain knowledge** about expected effect sizes\n", "4. **Cross-validate** between strict and soft calibration methods\n", "5. **Collect more data** when limited-data flattening is suspected\n", "6. **Document your calibration decisions** based on diagnostic evidence" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"🎉 Plateau diagnostics demo completed!\")\n", "print(\"\\nKey takeaways:\")\n", "print(\"1. Not all plateaus are created equal\")\n", "print(\"2. Diagnostics help distinguish genuine vs. artifactual flattening\")\n", "print(\"3. Multiple complementary tests provide robust evidence\")\n", "print(\"4. Consider both statistical and domain-specific evidence\")\n", "print(\"5. When in doubt, prefer softer calibration methods\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" } }, "nbformat": 4, "nbformat_minor": 4 }