Calibration Metrics¶
Calibration metrics for pairwise win rate predictions.
When your model says “A beats B with probability 0.65”, does A actually win 65% of the time? These tools check.
- winference.calibration.expected_calibration_error(predicted, observed, n_bins=10)[source]¶
Expected Calibration Error (ECE).
- Parameters:
- Return type:
- Returns:
Weighted average |predicted - observed| across bins.
- winference.calibration.brier_score(predicted, observed)[source]¶
Brier score: mean squared error of probability predictions.
Lower is better. Perfect calibration → 0. Random guessing → 0.25.
- winference.calibration.log_loss(predicted, observed)[source]¶
Binary cross-entropy loss. Lower is better.
- winference.calibration.reliability_diagram(predicted, observed, n_bins=10, ax=None, label='', color=None)[source]¶
Reliability diagram data and optional plot.
- Parameters:
predicted (
ndarray[tuple[Any,...],dtype[double]]) – Array of predicted probabilities.observed (
ndarray[tuple[Any,...],dtype[double]]) – Array of binary outcomes.n_bins (
int, default:10) – Number of bins.ax (
Any, default:None) – Matplotlib Axes. If provided, plot on it.label (
str, default:'') – Label for the legend.color (
str|None, default:None) – Color for the plot line.
- Return type:
- Returns:
Dict with ‘bin_midpoints’, ‘bin_accuracy’, ‘bin_counts’, ‘ece’.