API Reference¶

This section contains the complete API documentation for the rank_preserving_calibration package.

Main Calibration Functions¶

Robust rank-preserving multiclass probability calibration.

This module provides numerically stable implementations of rank-preserving calibration algorithms including Dykstra’s alternating projections and ADMM.

class rank_preserving_calibration.calibration.CalibrationResult(Q: ndarray, converged: bool, iterations: int, max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float)[source]¶

Bases: object

Result container for rank-preserving calibration algorithms.

Returned by calibrate_dykstra() containing the calibrated probability matrix and diagnostic information about convergence and constraint satisfaction.

Q¶

Calibrated probability matrix of shape (N, J) where rows sum to 1 and columns preserve rank ordering from original scores.

Type:: numpy.ndarray

converged¶

True if algorithm converged within specified tolerance.

Type:: bool

iterations¶

Number of iterations performed before termination.

Type:: int

max_row_error¶

Maximum absolute error in row sum constraint (should be ≈0).

Type:: float

max_col_error¶

Maximum absolute error in column sum constraint.

Type:: float

max_rank_violation¶

Maximum rank-order violation across all columns.

Type:: float

final_change¶

Final relative change in solution between iterations.

Type:: float

Examples

>>> result = calibrate_dykstra(P, M)
>>> if result.converged:
...     print(f"Calibration successful in {result.iterations} iterations")
>>> print(f"Max rank violation: {result.max_rank_violation:.6f}")

Q: ndarray¶

converged: bool¶

iterations: int¶

max_row_error: float¶

max_col_error: float¶

max_rank_violation: float¶

final_change: float¶

__init__(Q: ndarray, converged: bool, iterations: int, max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float) → None¶

class rank_preserving_calibration.calibration.ADMMResult(Q: ndarray, converged: bool, iterations: int, objective_values: list[float], primal_residuals: list[float], dual_residuals: list[float], max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float)[source]¶

Bases: object

Result from ADMM optimization. .. attribute:: Q

Calibrated probability matrix.

type:

np.ndarray

converged¶

Whether ADMM converged.

Type:: bool

iterations¶

Number of iterations performed.

Type:: int

objective_values¶

Objective function values over iterations.

Type:: list[float]

primal_residuals¶

Primal residual norms over iterations.

Type:: list[float]

dual_residuals¶

Dual residual norms over iterations.

Type:: list[float]

max_row_error¶

Maximum row sum error.

Type:: float

max_col_error¶

Maximum column sum error.

Type:: float

max_rank_violation¶

Maximum rank violation.

Type:: float

final_change¶

Final relative change between iterations.

Type:: float

Q: ndarray¶

converged: bool¶

iterations: int¶

objective_values: list[float]¶

primal_residuals: list[float]¶

dual_residuals: list[float]¶

max_row_error: float¶

max_col_error: float¶

max_rank_violation: float¶

final_change: float¶

__init__(Q: ndarray, converged: bool, iterations: int, objective_values: list[float], primal_residuals: list[float], dual_residuals: list[float], max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float) → None¶

exception rank_preserving_calibration.calibration.CalibrationError[source]¶

Bases: Exception

Raised when calibration fails due to invalid inputs or numerical issues.

rank_preserving_calibration.calibration.calibrate_dykstra(P: ndarray, M: ndarray, max_iters: int = 3000, tol: float = 1e-07, rtol: float = 0.0, feasibility_tol: float = 0.1, verbose: bool = False, callback: CallbackFunction = None, detect_cycles: bool = False, cycle_window: int = 10, nearly: dict | None = None, ties: str = 'stable', use_jit: bool = True) → CalibrationResult[source]¶

Calibrate using Dykstra’s alternating projections.

Projects multiclass probabilities onto the intersection of:

row simplex: {rows ≥ 0, rows sum to 1} and
column-wise isotone-by-score + fixed column sums: {nondecreasing in score order; column sum = M_j}.

This is the recommended default method for rank-preserving calibration. The algorithm uses exact Euclidean projections via Pool Adjacent Violators (PAV) followed by uniform shifts to satisfy sum constraints.

Parameters:

P – Input probability matrix of shape (N, J). Each row represents predicted class probabilities for one instance. Rows need not sum to 1 initially.
M – Target column sums of shape (J,). Should sum to approximately N for feasibility.
max_iters – Maximum number of iterations. Default 3000 is usually sufficient.
tol – Convergence tolerance for relative change in solution. Default 1e-7.
rtol – Relative tolerance for isotonic violations in PAV. Default 0.0 (strict).
feasibility_tol – Tolerance for feasibility warnings when sum(M) differs from N.
verbose – If True, enables debug logging.
callback – Optional function called each iteration as callback(iter, change, Q). Should return False to terminate early.
detect_cycles – If True, detects and breaks cycles in the solution sequence.
cycle_window – Number of iterations to look back for cycle detection.
nearly – Optional dict for nearly-isotonic constraints. Use {“mode”: “epsilon”, “eps”: 0.01} to allow small isotonicity violations.
ties – How to handle tied scores. “stable” preserves input order, “group” pools equal-score instances.
use_jit – If True and numba is available, uses JIT-compiled functions for speed.

Returns:

Q: Calibrated probability matrix of shape (N, J)
converged: Always True (failures raise CalibrationError instead)
iterations: Number of iterations performed
max_row_error: Maximum absolute row sum error
max_col_error: Maximum absolute column sum error
max_rank_violation: Maximum rank order violation
final_change: Final relative change in solution

Return type:

CalibrationResult object containing

Raises:

CalibrationError – If inputs are invalid, algorithm fails to converge, or other errors occur.
ValueError – If ties parameter is not “stable” or “group”

Examples

Basic calibration:

>>> import numpy as np
>>> from rank_preserving_calibration import calibrate_dykstra
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> M = np.array([1.0, 0.7, 0.3])  # Target column sums
>>> result = calibrate_dykstra(P, M)
>>> print(f"Converged: {result.converged}")
>>> print(f"Row sums: {result.Q.sum(axis=1)}")
>>> print(f"Column sums: {result.Q.sum(axis=0)}")

With nearly-isotonic constraints:

>>> result = calibrate_dykstra(P, M, nearly={"mode": "epsilon", "eps": 0.05})

Notes

Converges to the exact intersection of the constraint sets
Preserves ranking within each class (column) by original model scores
Memory complexity is O(N*J) for the probability matrices
Time complexity per iteration is O(N*J*log(N)) due to sorting
For best performance, ensure sum(M) ≈ N and use numba if available
Raises CalibrationError on convergence failure instead of returning unreliable results

rank_preserving_calibration.calibration.calibrate_admm(P: ndarray, M: ndarray, rho: float = 1.0, max_iters: int = 1000, tol: float = 1e-06, rtol: float = 0.0, feasibility_tol: float = 0.1, verbose: bool = False, nearly: dict | None = None, ties: str = 'stable', use_jit: bool = True) → ADMMResult[source]¶

Calibrate using ADMM-style optimization with penalty methods.

An alternative to Dykstra’s projections that handles row/column sum constraints via Lagrange multipliers and rank-preservation through either strict isotonic regression or lambda-penalty nearly-isotonic proximal operators.

The algorithm minimizes ||Q - P||² subject to constraint sets using an augmented Lagrangian approach. For final optimality verification, the solution is snapped to the exact intersection using a short Dykstra polish.

Parameters:

P – Input probability matrix of shape (N, J). Each row represents predicted class probabilities for one instance. Rows need not sum to 1 initially.
M – Target column sums of shape (J,). Should sum to approximately N for feasibility.
rho – ADMM penalty parameter. Larger values enforce constraints more aggressively. Default 1.0 works well for most problems.
max_iters – Maximum number of iterations. Default 1000 is usually sufficient.
tol – Convergence tolerance for primal/dual residuals. Default 1e-6.
rtol – Relative tolerance for isotonic violations in PAV. Default 0.0 (strict).
feasibility_tol – Tolerance for feasibility warnings when sum(M) differs from N.
verbose – If True, enables debug logging.
nearly – Optional dict for nearly-isotonic constraints. Use {“mode”: “lambda”, “lam”: 1.0} for lambda-penalty approach allowing soft isotonicity violations.
ties – How to handle tied scores. “stable” preserves input order, “group” pools equal-score instances.
use_jit – If True and numba is available, uses JIT-compiled functions for speed.

Returns:

Q: Calibrated probability matrix of shape (N, J)
converged: Always True (failures raise CalibrationError instead)
iterations: Number of iterations performed
max_row_error: Maximum absolute row sum error
max_col_error: Maximum absolute column sum error
max_rank_violation: Maximum rank order violation
final_change: Final relative change in solution
objective_values: List of objective function values per iteration
primal_residuals: List of primal residual norms per iteration
dual_residuals: List of dual residual norms per iteration

Return type:

ADMMResult object containing

Raises:

CalibrationError – If inputs are invalid, algorithm fails to converge, or other errors occur.
ValueError – If ties parameter is not “stable” or “group”

Examples

Basic ADMM calibration:

>>> import numpy as np
>>> from rank_preserving_calibration import calibrate_admm
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> M = np.array([1.0, 0.7, 0.3])
>>> result = calibrate_admm(P, M)
>>> print(f"Converged: {result.converged}")
>>> print(f"Objective values: {result.objective_values[-5:]}")

With lambda-penalty for soft isotonicity:

>>> result = calibrate_admm(P, M, nearly={"mode": "lambda", "lam": 2.0})

Adjusting penalty parameter:

>>> result = calibrate_admm(P, M, rho=5.0)  # Stronger constraint enforcement

Notes

Often converges faster than Dykstra for well-conditioned problems
Provides convergence diagnostics via objective and residual histories
Lambda-penalty mode allows trading off isotonicity for fit quality
Final solution is snapped to exact feasible set for optimality
Experimental: may need parameter tuning for difficult problems

Result Classes¶

Bases: object

Result container for rank-preserving calibration algorithms.

Returned by calibrate_dykstra() containing the calibrated probability matrix and diagnostic information about convergence and constraint satisfaction.

Q¶

Calibrated probability matrix of shape (N, J) where rows sum to 1 and columns preserve rank ordering from original scores.

Type:: numpy.ndarray

converged¶

True if algorithm converged within specified tolerance.

Type:: bool

iterations¶

Number of iterations performed before termination.

Type:: int

max_row_error¶

Maximum absolute error in row sum constraint (should be ≈0).

Type:: float

max_col_error¶

Maximum absolute error in column sum constraint.

Type:: float

max_rank_violation¶

Maximum rank-order violation across all columns.

Type:: float

final_change¶

Final relative change in solution between iterations.

Type:: float

Examples

>>> result = calibrate_dykstra(P, M)
>>> if result.converged:
...     print(f"Calibration successful in {result.iterations} iterations")
>>> print(f"Max rank violation: {result.max_rank_violation:.6f}")

Q: ndarray¶

converged: bool¶

iterations: int¶

max_row_error: float¶

max_col_error: float¶

max_rank_violation: float¶

final_change: float¶

__init__(Q: ndarray, converged: bool, iterations: int, max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float) → None¶

Bases: object

Result from ADMM optimization. .. attribute:: Q

Calibrated probability matrix.

type:

np.ndarray

converged¶

Whether ADMM converged.

Type:: bool

iterations¶

Number of iterations performed.

Type:: int

objective_values¶

Objective function values over iterations.

Type:: list[float]

primal_residuals¶

Primal residual norms over iterations.

Type:: list[float]

dual_residuals¶

Dual residual norms over iterations.

Type:: list[float]

max_row_error¶

Maximum row sum error.

Type:: float

max_col_error¶

Maximum column sum error.

Type:: float

max_rank_violation¶

Maximum rank violation.

Type:: float

final_change¶

Final relative change between iterations.

Type:: float

Q: ndarray¶

converged: bool¶

iterations: int¶

objective_values: list[float]¶

primal_residuals: list[float]¶

dual_residuals: list[float]¶

max_row_error: float¶

max_col_error: float¶

max_rank_violation: float¶

final_change: float¶

__init__(Q: ndarray, converged: bool, iterations: int, objective_values: list[float], primal_residuals: list[float], dual_residuals: list[float], max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float) → None¶

Nearly Isotonic Functions¶

Nearly isotonic regression utilities.

This module provides “relaxed” isotonic constraints that allow small violations of monotonicity—useful when strict isotonicity is too restrictive.

Exports:

project_near_isotonic_euclidean: ε-slack projection (exact L2), optional sum target.
prox_near_isotonic: exact proximal operator for λ * sum (z_i - z_{i+1})_+.
prox_near_isotonic_with_sum: same prox with an exact sum constraint via translation.

rank_preserving_calibration.nearly.project_near_isotonic_euclidean(v: ndarray, eps: float, sum_target: float | None = None, weights: ndarray | None = None) → ndarray[source]¶

Project v onto the set { z : z_{i+1} >= z_i - eps } in L2.

Reduction: define w_i = v_i + i * eps. Then the constraint becomes standard isotonic on w. Let w* = isotonic(w) (weighted if weights are provided). The exact projection is z* = w* - i * eps. If sum_target is given, apply a uniform shift so 1^T z* = sum_target (translation invariance makes this exact).

Parameters:

v ((n,) array_like) – Input vector to project.
eps (float) – Slack parameter (>= 0). Allows z[i+1] >= z[i] - eps instead of strict z[i+1] >= z[i].
sum_target (float, optional) – If provided, perform an exact uniform shift so the result sums to this value.
weights ((n,) array_like, optional) – Positive weights for a weighted L2 projection.

Returns:

z – Projected vector satisfying near-isotonic constraint (and sum, if requested).

Return type:

(n,) ndarray

Notes

Idempotent: applying the projection twice gives the same result.

rank_preserving_calibration.nearly.prox_near_isotonic(y: ndarray, lam: float, *, rho: float = 1.0, max_iters: int = 2000, abstol: float = 1e-08, reltol: float = 1e-06, return_info: bool = False) → ndarray | tuple[ndarray, dict][source]¶

Exact proximal operator for the λ-penalty nearly-isotonic term:: prox_{λ R}(y), R(z) = ∑_{i=1}^{n-1} (z_i - z_{i+1})_+,

where (x)_+ = max(0, x). This penalizes downward steps, relaxing strict isotonicity.

We solve:

minimize_z 0.5 * ||z - y||_2^2 + λ * ∑ (Dz)_+ with (Dz)_i = z_i - z_{i+1}

by ADMM on the split Dz = r:

z-update: (I + ρ D^T D) z = y + ρ D^T (r - u) [SPD tridiagonal solve] r-update: r = prox_{(λ/ρ)‖·‖_{+,1}}(Dz + u) [one-sided soft-threshold] u-update: u ← u + Dz - r

One-sided soft-threshold (elementwise):

prox_{t(·)_+}(v) = { v - t, if v > t: 0, if 0 ≤ v ≤ t v, if v < 0 }

rank_preserving_calibration.nearly.prox_near_isotonic_with_sum(y: ndarray, lam: float, sum_target: float, *, return_info: bool = False, **kwargs) → ndarray | tuple[ndarray, dict][source]¶

Prox with a sum constraint:: minimize_z 0.5 * ||z - y||_2^2 + λ ∑ (z_i - z_{i+1})_+ subject to 1^T z = sum_target.
Because R(z) depends only on differences, it is translation-invariant:: R(z + c·1) = R(z).
Hence the constrained prox is obtained exactly by:: z* = prox_{λR}(y) + ((sum_target - 1^T prox_{λR}(y)) / n) · 1

Returns:

If return_info = False (default) (z (ndarray)*)
If return_info = True ((z, info_dict)*)

Evaluation and Metrics¶

rank_preserving_calibration.metrics.kl_divergence(Q: ndarray, R: ndarray, eps: float = 1e-15) → float[source]¶

Compute KL divergence KL(Q||R) = sum Q * log(Q/R).

The KL divergence measures how much Q differs from R, where R is the reference distribution. Uses convention 0 * log(0/x) = 0.

Parameters:

Q – Probability matrix of shape (N, J).
R – Reference probability matrix of shape (N, J).
eps – Small constant for numerical stability.

Returns:

KL divergence value (non-negative).

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import kl_divergence
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.6, 0.3, 0.1], [0.4, 0.4, 0.2]])
>>> kl = kl_divergence(Q, P)
>>> print(f"KL divergence: {kl:.4f}")

rank_preserving_calibration.metrics.reverse_kl_divergence(Q: ndarray, R: ndarray, eps: float = 1e-15) → float[source]¶

Compute reverse KL divergence KL(R||Q) = sum R * log(R/Q).

The reverse KL divergence has different properties than forward KL: - Forward KL(Q||R) is mean-seeking (Q spreads over modes of R) - Reverse KL(R||Q) is mode-seeking (Q focuses on main mode of R)

Parameters:

Q – Probability matrix of shape (N, J).
R – Reference probability matrix of shape (N, J).
eps – Small constant for numerical stability.

Returns:

Reverse KL divergence value (non-negative).

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import reverse_kl_divergence
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.6, 0.3, 0.1], [0.4, 0.4, 0.2]])
>>> rkl = reverse_kl_divergence(Q, P)

rank_preserving_calibration.metrics.symmetrized_kl(Q: ndarray, R: ndarray, eps: float = 1e-15) → float[source]¶

Compute symmetrized KL divergence (Jensen-Shannon related).

Symmetrized KL = 0.5 * (KL(Q||R) + KL(R||Q))

This is symmetric in Q and R, unlike the standard KL divergence.

Parameters:

Q – Probability matrix of shape (N, J).
R – Reference probability matrix of shape (N, J).
eps – Small constant for numerical stability.

Returns:

Symmetrized KL divergence value (non-negative).

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import symmetrized_kl
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.6, 0.3, 0.1], [0.4, 0.4, 0.2]])
>>> skl = symmetrized_kl(Q, P)

rank_preserving_calibration.metrics.feasibility_metrics(Q: ndarray, M: ndarray | None = None) → dict[str, Any][source]¶

Compute feasibility metrics for calibrated probability matrix.

Analyzes how well the probability matrix Q satisfies the row and column sum constraints required by rank-preserving calibration.

Parameters:

Q – Calibrated probability matrix of shape (N, J). Should have rows summing to 1 and non-negative entries.
M – Target column sums of shape (J,). If provided, analyzes column sum constraint satisfaction.

Returns:

“row”: Row constraint metrics including max/mean absolute errors and min value
”col”: Column constraint metrics (if M provided) including various error norms

Return type:

Dictionary containing feasibility metrics

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import feasibility_metrics
>>> Q = np.array([[0.6, 0.3, 0.1], [0.4, 0.4, 0.2]])
>>> M = np.array([1.0, 0.7, 0.3])
>>> metrics = feasibility_metrics(Q, M)
>>> print(f"Max row error: {metrics['row']['max_abs_error']:.6f}")
>>> print(f"Max col error: {metrics['col']['max_abs_error']:.6f}")

rank_preserving_calibration.metrics.isotonic_metrics(Q: ndarray, P: ndarray) → dict[str, Any][source]¶

Analyze isotonic (rank-preserving) properties of calibrated probabilities.

Verifies that calibrated probabilities are nondecreasing when ordered by original model scores within each class. Computes violation statistics and flatness measures.

Parameters:

Q – Calibrated probability matrix of shape (N, J).
P – Original probability matrix of shape (N, J) used for score ordering.

Returns:

“max_rank_violation”: Maximum rank-order violation across all columns
”mass_weighted_violation”: Sum of violation magnitudes weighted by mass
”flat_fraction”: Fraction of adjacent pairs that are equal (flatness)
”per_class_max_violation”: List of max violations per class
”per_class_mass_violation”: List of mass-weighted violations per class

Return type:

Dictionary containing isotonic metrics

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import isotonic_metrics
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.65, 0.25, 0.1], [0.35, 0.45, 0.2]])
>>> metrics = isotonic_metrics(Q, P)
>>> print(f"Max violation: {metrics['max_rank_violation']:.6f}")

rank_preserving_calibration.metrics.tie_group_variance(Q: ndarray, P: ndarray) → dict[str, Any][source]¶: Within-equal-score group variance of Q; for ties=’group’ this should be ~0.

rank_preserving_calibration.metrics.distance_metrics(Q: ndarray, P: ndarray) → dict[str, float][source]¶

Compute distance metrics between original and calibrated probabilities.

Measures how much the calibration procedure has changed the probability estimates using various matrix norms and distance measures.

Parameters:

Q – Calibrated probability matrix of shape (N, J).
P – Original probability matrix of shape (N, J).

Returns:

“frobenius”: Frobenius norm ||Q - P||_F
”frobenius_sq”: Squared Frobenius norm ||Q - P||²_F
”mean_abs”: Mean absolute difference
”max_abs”: Maximum absolute difference

Return type:

Dictionary containing distance metrics

Examples

>>> import numpy as np
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.65, 0.25, 0.1], [0.35, 0.45, 0.2]])
>>> distances = distance_metrics(Q, P)
>>> print(f"Frobenius distance: {distances['frobenius']:.4f}")

rank_preserving_calibration.metrics.nll(y: ndarray, probs: ndarray) → float[source]¶

Compute negative log-likelihood (cross-entropy loss).

Parameters:

y – True class labels as integers of shape (N,).
probs – Probability matrix of shape (N, J).

Returns:

Average negative log-likelihood across all instances.

Examples

>>> import numpy as np
>>> y = np.array([0, 1, 2])
>>> probs = np.array([[0.8, 0.1, 0.1], [0.2, 0.7, 0.1], [0.1, 0.2, 0.7]])
>>> loss = nll(y, probs)
>>> print(f"NLL: {loss:.4f}")

rank_preserving_calibration.metrics.brier(y: ndarray, probs: ndarray) → float[source]¶

Compute Brier score for multiclass classification.

The Brier score measures the mean squared difference between predicted probabilities and one-hot encoded true labels. Lower is better.

Parameters:

y – True class labels as integers of shape (N,).
probs – Probability matrix of shape (N, J).

Returns:

Brier score (mean squared error).

Examples

>>> import numpy as np
>>> y = np.array([0, 1, 2])
>>> probs = np.array([[0.8, 0.1, 0.1], [0.2, 0.7, 0.1], [0.1, 0.2, 0.7]])
>>> score = brier(y, probs)
>>> print(f"Brier score: {score:.4f}")

rank_preserving_calibration.metrics.top_label_ece(y: ndarray, probs: ndarray, n_bins: int = 15) → dict[source]¶

Compute Expected Calibration Error for top-label predictions.

Bins predictions by confidence (max probability) and measures the gap between confidence and accuracy in each bin. ECE is the weighted average of these gaps, and MCE is the maximum gap.

Parameters:

y – True class labels as integers of shape (N,).
probs – Probability matrix of shape (N, J).
n_bins – Number of confidence bins (default 15).

Returns:

“ece”: Expected Calibration Error (weighted average gap)
”mce”: Maximum Calibration Error (max gap across bins)
”bins”: List of (count, mean_confidence, accuracy) per bin

Return type:

Dictionary containing

Examples

>>> import numpy as np
>>> y = np.array([0, 1, 2])
>>> probs = np.array([[0.8, 0.1, 0.1], [0.2, 0.7, 0.1], [0.1, 0.2, 0.7]])
>>> ece_result = top_label_ece(y, probs)
>>> print(f"ECE: {ece_result['ece']:.4f}")

rank_preserving_calibration.metrics.classwise_ece(y: ndarray, probs: ndarray, n_bins: int = 15, balanced: bool = False) → dict[source]¶

Compute class-wise Expected Calibration Error.

For each class, bins predictions by the predicted probability for that class and measures the gap between predicted probability and actual frequency. Aggregates across all classes.

Parameters:

y – True class labels as integers of shape (N,).
probs – Probability matrix of shape (N, J).
n_bins – Number of probability bins per class (default 15).
balanced – If True, weight classes equally; otherwise weight by sample count.

Returns:

“ece”: Aggregated Expected Calibration Error
”mce”: Maximum Calibration Error across all classes
”per_class”: List of per-class ECE/MCE metrics and bin tables

Return type:

Dictionary containing

Examples

>>> import numpy as np
>>> y = np.array([0, 0, 1, 1, 2, 2])
>>> probs = np.array([[0.8, 0.1, 0.1], [0.7, 0.2, 0.1],
...                   [0.2, 0.6, 0.2], [0.1, 0.8, 0.1],
...                   [0.1, 0.2, 0.7], [0.1, 0.1, 0.8]])
>>> result = classwise_ece(y, probs)
>>> print(f"Classwise ECE: {result['ece']:.4f}")

rank_preserving_calibration.metrics.sharpness_metrics(probs: ndarray) → dict[str, float][source]¶

Compute sharpness metrics for probability predictions.

Sharpness measures how confident (peaked) the predictions are, independent of whether they are correct. Sharper predictions are more informative.

Parameters:

probs – Probability matrix of shape (N, J).

Returns:

“mean_entropy”: Average entropy across predictions (lower = sharper)
”mean_max_prob”: Average of maximum probabilities (higher = sharper)
”mean_margin”: Average difference between top two probabilities

Return type:

Dictionary containing

Examples

>>> import numpy as np
>>> probs = np.array([[0.9, 0.05, 0.05], [0.5, 0.3, 0.2]])
>>> sharp = sharpness_metrics(probs)
>>> print(f"Mean max prob: {sharp['mean_max_prob']:.3f}")

rank_preserving_calibration.metrics.column_variance(Q: ndarray) → dict[str, Any][source]¶

Compute per-column variance of a probability matrix.

Useful for measuring how “spread out” the calibrated probabilities are within each class. Lower variance may indicate a “flatter” solution.

Parameters:

Q – Probability matrix of shape (N, J).

Returns:

“per_column”: List of variance values for each column
”mean”: Mean variance across columns
”min”: Minimum column variance
”max”: Maximum column variance

Return type:

Dictionary containing

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import column_variance
>>> Q = np.array([[0.6, 0.3, 0.1], [0.4, 0.4, 0.2]])
>>> var = column_variance(Q)
>>> print(f"Mean column variance: {var['mean']:.4f}")

rank_preserving_calibration.metrics.informativeness_ratio(Q: ndarray, P: ndarray) → dict[str, Any][source]¶

Compare variance of Q vs P to measure information preservation.

The informativeness ratio measures how much of the original probability spread is preserved after calibration. A ratio close to 1 means the calibration preserved the discriminative structure; a ratio near 0 indicates a “flat” solution with little discrimination between samples.

Parameters:

Q – Calibrated probability matrix of shape (N, J).
P – Original probability matrix of shape (N, J).

Returns:

“total_ratio”: Var(Q) / Var(P)
”per_column_ratio”: Variance ratio for each column
”mean_column_ratio”: Mean of per-column ratios
”interpretation”: String describing the result

Return type:

Dictionary containing

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import informativeness_ratio
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.5, 0.35, 0.15], [0.5, 0.35, 0.15]])  # Flat
>>> ratio = informativeness_ratio(Q, P)
>>> print(f"Total ratio: {ratio['total_ratio']:.3f}")

rank_preserving_calibration.metrics.auc_deltas(y: ndarray, P: ndarray, Q: ndarray) → dict[str, Any][source]¶: One-vs-rest AUC before vs after; useful to check rank effects of plateaus.

Exception Classes¶

class rank_preserving_calibration.calibration.CalibrationError[source]¶

Bases: Exception

Raised when calibration fails due to invalid inputs or numerical issues.

Complete Module Documentation¶

Rank-preserving calibration of multiclass probabilities.

This package provides robust implementations of rank-preserving calibration algorithms including Dykstra’s alternating projections (exact intersection) and ADMM (penalty-based with final snap to the exact projection).

Quick start¶

>>> import numpy as np
>>> from rank_preserving_calibration import calibrate_dykstra, feasibility_metrics, isotonic_metrics
>>> # Toy data
>>> rng = np.random.default_rng(42)
>>> P = rng.dirichlet(np.ones(4), size=100)       # N x J predicted probs (rows sum to 1)
>>> M = (P.sum(axis=0) + rng.normal(0, 0.05, 4))  # target column marginals (sum ≈ N)
>>> M = np.maximum(M, 1e-3)
>>> # Calibrate
>>> res = calibrate_dykstra(P, M, detect_cycles=False, rtol=0.0)
>>> print(res.converged, res.iterations)
>>> # Check invariants
>>> print(feasibility_metrics(res.Q, M))
>>> print(isotonic_metrics(res.Q, P)["max_rank_violation"])

class rank_preserving_calibration.ADMMResult(Q: ndarray, converged: bool, iterations: int, objective_values: list[float], primal_residuals: list[float], dual_residuals: list[float], max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float)[source]¶

Bases: object

Result from ADMM optimization. .. attribute:: Q

Calibrated probability matrix.

type:

np.ndarray

converged¶

Whether ADMM converged.

Type:: bool

iterations¶

Number of iterations performed.

Type:: int

objective_values¶

Objective function values over iterations.

Type:: list[float]

primal_residuals¶

Primal residual norms over iterations.

Type:: list[float]

dual_residuals¶

Dual residual norms over iterations.

Type:: list[float]

max_row_error¶

Maximum row sum error.

Type:: float

max_col_error¶

Maximum column sum error.

Type:: float

max_rank_violation¶

Maximum rank violation.

Type:: float

final_change¶

Final relative change between iterations.

Type:: float

Q: ndarray¶

converged: bool¶

iterations: int¶

objective_values: list[float]¶

primal_residuals: list[float]¶

dual_residuals: list[float]¶

max_row_error: float¶

max_col_error: float¶

max_rank_violation: float¶

final_change: float¶

__init__(Q: ndarray, converged: bool, iterations: int, objective_values: list[float], primal_residuals: list[float], dual_residuals: list[float], max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float) → None¶

exception rank_preserving_calibration.CalibrationError[source]¶

Bases: Exception

Raised when calibration fails due to invalid inputs or numerical issues.

class rank_preserving_calibration.CalibrationResult(Q: ndarray, converged: bool, iterations: int, max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float)[source]¶

Bases: object

Result container for rank-preserving calibration algorithms.

Returned by calibrate_dykstra() containing the calibrated probability matrix and diagnostic information about convergence and constraint satisfaction.

Q¶

Calibrated probability matrix of shape (N, J) where rows sum to 1 and columns preserve rank ordering from original scores.

Type:: numpy.ndarray

converged¶

True if algorithm converged within specified tolerance.

Type:: bool

iterations¶

Number of iterations performed before termination.

Type:: int

max_row_error¶

Maximum absolute error in row sum constraint (should be ≈0).

Type:: float

max_col_error¶

Maximum absolute error in column sum constraint.

Type:: float

max_rank_violation¶

Maximum rank-order violation across all columns.

Type:: float

final_change¶

Final relative change in solution between iterations.

Type:: float

Examples

>>> result = calibrate_dykstra(P, M)
>>> if result.converged:
...     print(f"Calibration successful in {result.iterations} iterations")
>>> print(f"Max rank violation: {result.max_rank_violation:.6f}")

Q: ndarray¶

converged: bool¶

iterations: int¶

max_row_error: float¶

max_col_error: float¶

max_rank_violation: float¶

final_change: float¶

__init__(Q: ndarray, converged: bool, iterations: int, max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float) → None¶

class rank_preserving_calibration.IPFResult(Q: ndarray, converged: bool, iterations: int, max_row_error: float, max_col_error: float, final_change: float)[source]¶

Bases: object

Result from Iterative Proportional Fitting.

Q¶

Probability matrix after IPF.

Type:: numpy.ndarray

converged¶

Whether IPF converged.

Type:: bool

iterations¶

Number of iterations.

Type:: int

max_row_error¶

Maximum row sum error.

Type:: float

max_col_error¶

Maximum column sum error.

Type:: float

final_change¶

Final relative change between iterations.

Type:: float

Q: ndarray¶

converged: bool¶

iterations: int¶

max_row_error: float¶

max_col_error: float¶

final_change: float¶

__init__(Q: ndarray, converged: bool, iterations: int, max_row_error: float, max_col_error: float, final_change: float) → None¶

class rank_preserving_calibration.KLCalibrationResult(Q: ndarray, converged: bool, iterations: int, kl_divergence: float, max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float)[source]¶

Bases: object

Result container for KL-divergence rank-preserving calibration.

Q¶

Calibrated probability matrix of shape (N, J) where rows sum to 1 and columns preserve rank ordering from anchor scores.

Type:: numpy.ndarray

converged¶

True if algorithm converged within specified tolerance.

Type:: bool

iterations¶

Number of iterations performed before termination.

Type:: int

kl_divergence¶

Final KL(Q||R) divergence value.

Type:: float

max_row_error¶

Maximum absolute error in row sum constraint.

Type:: float

max_col_error¶

Maximum absolute error in column sum constraint.

Type:: float

max_rank_violation¶

Maximum rank-order violation across all columns.

Type:: float

final_change¶

Final relative change in solution between iterations.

Type:: float

Q: ndarray¶

converged: bool¶

iterations: int¶

kl_divergence: float¶

max_row_error: float¶

max_col_error: float¶

max_rank_violation: float¶

final_change: float¶

__init__(Q: ndarray, converged: bool, iterations: int, kl_divergence: float, max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float) → None¶

class rank_preserving_calibration.KLParetoResult(solutions: list[tuple[float, ~numpy.ndarray]] = <factory>, kl_values: list[float] = <factory>, rank_violations: list[float] = <factory>, lambda_path: list[float] = <factory>)[source]¶

Bases: object

Result container for KL Pareto frontier computation.

solutions¶

List of (lambda, Q) pairs along the Pareto frontier.

Type:: list[tuple[float, numpy.ndarray]]

kl_values¶

KL divergence values for each solution.

Type:: list[float]

rank_violations¶

Total rank violation for each solution.

Type:: list[float]

lambda_path¶

Lambda values used in the sweep.

Type:: list[float]

solutions: list[tuple[float, ndarray]]¶

kl_values: list[float]¶

rank_violations: list[float]¶

lambda_path: list[float]¶

__init__(solutions: list[tuple[float, ~numpy.ndarray]] = <factory>, kl_values: list[float] = <factory>, rank_violations: list[float] = <factory>, lambda_path: list[float] = <factory>) → None¶

class rank_preserving_calibration.SoftCalibrationResult(Q: ndarray, converged: bool, iterations: int, objective_values: list[float], fit_term: float, marginal_term: float, rank_term: float, max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float)[source]¶

Bases: object

Result from soft-constraint calibration.

Q¶

Calibrated probability matrix.

Type:: numpy.ndarray

converged¶

Whether the algorithm converged.

Type:: bool

iterations¶

Number of iterations performed.

Type:: int

objective_values¶

Objective function values over iterations.

Type:: list[float]

fit_term¶

Final ||Q - P||² term.

Type:: float

marginal_term¶

Final ||col_sums(Q) - M||² term.

Type:: float

rank_term¶

Final rank violation penalty term.

Type:: float

max_row_error¶

Maximum row sum error.

Type:: float

max_col_error¶

Maximum column sum error.

Type:: float

max_rank_violation¶

Maximum rank violation.

Type:: float

final_change¶

Final relative change between iterations.

Type:: float

Q: ndarray¶

converged: bool¶

iterations: int¶

objective_values: list[float]¶

fit_term: float¶

marginal_term: float¶

rank_term: float¶

max_row_error: float¶

max_col_error: float¶

max_rank_violation: float¶

final_change: float¶

__init__(Q: ndarray, converged: bool, iterations: int, objective_values: list[float], fit_term: float, marginal_term: float, rank_term: float, max_row_error: float, max_col_error: float, max_rank_violation: float, final_change: float) → None¶

class rank_preserving_calibration.TwoStageResult(Q: ndarray, converged: bool, ipf_iterations: int, projection_iterations: int, max_row_error: float, max_col_error: float, max_rank_violation: float, ipf_result: ndarray)[source]¶

Bases: object

Result from two-stage IPF-based calibration.

Q¶

Calibrated probability matrix.

Type:: numpy.ndarray

converged¶

Whether the algorithm converged.

Type:: bool

ipf_iterations¶

Number of IPF iterations performed.

Type:: int

projection_iterations¶

Number of isotonic projection iterations.

Type:: int

max_row_error¶

Maximum row sum error.

Type:: float

max_col_error¶

Maximum column sum error.

Type:: float

max_rank_violation¶

Maximum rank violation.

Type:: float

ipf_result¶

The intermediate IPF result before isotonic projection.

Type:: numpy.ndarray

Q: ndarray¶

converged: bool¶

ipf_iterations: int¶

projection_iterations: int¶

max_row_error: float¶

max_col_error: float¶

max_rank_violation: float¶

ipf_result: ndarray¶

__init__(Q: ndarray, converged: bool, ipf_iterations: int, projection_iterations: int, max_row_error: float, max_col_error: float, max_rank_violation: float, ipf_result: ndarray) → None¶

rank_preserving_calibration.auc_deltas(y: ndarray, P: ndarray, Q: ndarray) → dict[str, Any][source]¶: One-vs-rest AUC before vs after; useful to check rank effects of plateaus.

rank_preserving_calibration.brier(y: ndarray, probs: ndarray) → float[source]¶

Compute Brier score for multiclass classification.

The Brier score measures the mean squared difference between predicted probabilities and one-hot encoded true labels. Lower is better.

Parameters:

y – True class labels as integers of shape (N,).
probs – Probability matrix of shape (N, J).

Returns:

Brier score (mean squared error).

Examples

>>> import numpy as np
>>> y = np.array([0, 1, 2])
>>> probs = np.array([[0.8, 0.1, 0.1], [0.2, 0.7, 0.1], [0.1, 0.2, 0.7]])
>>> score = brier(y, probs)
>>> print(f"Brier score: {score:.4f}")

rank_preserving_calibration.calibrate_admm(P: ndarray, M: ndarray, rho: float = 1.0, max_iters: int = 1000, tol: float = 1e-06, rtol: float = 0.0, feasibility_tol: float = 0.1, verbose: bool = False, nearly: dict | None = None, ties: str = 'stable', use_jit: bool = True) → ADMMResult[source]¶

Calibrate using ADMM-style optimization with penalty methods.

Parameters:

P – Input probability matrix of shape (N, J). Each row represents predicted class probabilities for one instance. Rows need not sum to 1 initially.
M – Target column sums of shape (J,). Should sum to approximately N for feasibility.
rho – ADMM penalty parameter. Larger values enforce constraints more aggressively. Default 1.0 works well for most problems.
max_iters – Maximum number of iterations. Default 1000 is usually sufficient.
tol – Convergence tolerance for primal/dual residuals. Default 1e-6.
rtol – Relative tolerance for isotonic violations in PAV. Default 0.0 (strict).
feasibility_tol – Tolerance for feasibility warnings when sum(M) differs from N.
verbose – If True, enables debug logging.
nearly – Optional dict for nearly-isotonic constraints. Use {“mode”: “lambda”, “lam”: 1.0} for lambda-penalty approach allowing soft isotonicity violations.
ties – How to handle tied scores. “stable” preserves input order, “group” pools equal-score instances.
use_jit – If True and numba is available, uses JIT-compiled functions for speed.

Returns:

Q: Calibrated probability matrix of shape (N, J)
converged: Always True (failures raise CalibrationError instead)
iterations: Number of iterations performed
max_row_error: Maximum absolute row sum error
max_col_error: Maximum absolute column sum error
max_rank_violation: Maximum rank order violation
final_change: Final relative change in solution
objective_values: List of objective function values per iteration
primal_residuals: List of primal residual norms per iteration
dual_residuals: List of dual residual norms per iteration

Return type:

ADMMResult object containing

Raises:

CalibrationError – If inputs are invalid, algorithm fails to converge, or other errors occur.
ValueError – If ties parameter is not “stable” or “group”

Examples

Basic ADMM calibration:

>>> import numpy as np
>>> from rank_preserving_calibration import calibrate_admm
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> M = np.array([1.0, 0.7, 0.3])
>>> result = calibrate_admm(P, M)
>>> print(f"Converged: {result.converged}")
>>> print(f"Objective values: {result.objective_values[-5:]}")

With lambda-penalty for soft isotonicity:

>>> result = calibrate_admm(P, M, nearly={"mode": "lambda", "lam": 2.0})

Adjusting penalty parameter:

>>> result = calibrate_admm(P, M, rho=5.0)  # Stronger constraint enforcement

Notes

Often converges faster than Dykstra for well-conditioned problems
Provides convergence diagnostics via objective and residual histories
Lambda-penalty mode allows trading off isotonicity for fit quality
Final solution is snapped to exact feasible set for optimality
Experimental: may need parameter tuning for difficult problems

rank_preserving_calibration.calibrate_dykstra(P: ndarray, M: ndarray, max_iters: int = 3000, tol: float = 1e-07, rtol: float = 0.0, feasibility_tol: float = 0.1, verbose: bool = False, callback: CallbackFunction = None, detect_cycles: bool = False, cycle_window: int = 10, nearly: dict | None = None, ties: str = 'stable', use_jit: bool = True) → CalibrationResult[source]¶

Calibrate using Dykstra’s alternating projections.

Projects multiclass probabilities onto the intersection of:

row simplex: {rows ≥ 0, rows sum to 1} and
column-wise isotone-by-score + fixed column sums: {nondecreasing in score order; column sum = M_j}.

Parameters:

P – Input probability matrix of shape (N, J). Each row represents predicted class probabilities for one instance. Rows need not sum to 1 initially.
M – Target column sums of shape (J,). Should sum to approximately N for feasibility.
max_iters – Maximum number of iterations. Default 3000 is usually sufficient.
tol – Convergence tolerance for relative change in solution. Default 1e-7.
rtol – Relative tolerance for isotonic violations in PAV. Default 0.0 (strict).
feasibility_tol – Tolerance for feasibility warnings when sum(M) differs from N.
verbose – If True, enables debug logging.
callback – Optional function called each iteration as callback(iter, change, Q). Should return False to terminate early.
detect_cycles – If True, detects and breaks cycles in the solution sequence.
cycle_window – Number of iterations to look back for cycle detection.
nearly – Optional dict for nearly-isotonic constraints. Use {“mode”: “epsilon”, “eps”: 0.01} to allow small isotonicity violations.
ties – How to handle tied scores. “stable” preserves input order, “group” pools equal-score instances.
use_jit – If True and numba is available, uses JIT-compiled functions for speed.

Returns:

Q: Calibrated probability matrix of shape (N, J)
converged: Always True (failures raise CalibrationError instead)
iterations: Number of iterations performed
max_row_error: Maximum absolute row sum error
max_col_error: Maximum absolute column sum error
max_rank_violation: Maximum rank order violation
final_change: Final relative change in solution

Return type:

CalibrationResult object containing

Raises:

CalibrationError – If inputs are invalid, algorithm fails to converge, or other errors occur.
ValueError – If ties parameter is not “stable” or “group”

Examples

Basic calibration:

>>> import numpy as np
>>> from rank_preserving_calibration import calibrate_dykstra
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> M = np.array([1.0, 0.7, 0.3])  # Target column sums
>>> result = calibrate_dykstra(P, M)
>>> print(f"Converged: {result.converged}")
>>> print(f"Row sums: {result.Q.sum(axis=1)}")
>>> print(f"Column sums: {result.Q.sum(axis=0)}")

With nearly-isotonic constraints:

>>> result = calibrate_dykstra(P, M, nearly={"mode": "epsilon", "eps": 0.05})

Notes

Converges to the exact intersection of the constraint sets
Preserves ranking within each class (column) by original model scores
Memory complexity is O(N*J) for the probability matrices
Time complexity per iteration is O(N*J*log(N)) due to sorting
For best performance, ensure sum(M) ≈ N and use numba if available
Raises CalibrationError on convergence failure instead of returning unreliable results

rank_preserving_calibration.calibrate_ipf(P: ndarray, M: ndarray, max_iters: int = 100, tol: float = 1e-08, verbose: bool = False) → IPFResult[source]¶

Iterative Proportional Fitting (raking) to match target marginals.

IPF alternately scales rows and columns to match target constraints: - Row scaling: Ensure rows sum to 1 (probability simplex) - Column scaling: Scale columns to match target marginals M

Unlike Dykstra/ADMM, IPF preserves the relative structure of probabilities within rows (ratios between classes). This can produce less “flat” solutions but may not perfectly satisfy all constraints simultaneously.

Parameters:

P – Input probability matrix of shape (N, J).
M – Target column sums of shape (J,). Should sum to N.
max_iters – Maximum number of row/column scaling iterations.
tol – Convergence tolerance for relative change.
verbose – If True, enables debug logging.

Returns:

IPFResult with the scaled probability matrix.

Raises:

CalibrationError – If inputs are invalid.

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import calibrate_ipf
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> M = np.array([0.8, 0.8, 0.4])
>>> result = calibrate_ipf(P, M)
>>> print(f"Column sums: {result.Q.sum(axis=0)}")

rank_preserving_calibration.calibrate_kl(P: ndarray, M: ndarray, R: ndarray | None = None, A: ndarray | None = None, max_iters: int = 3000, tol: float = 1e-07, feasibility_tol: float = 0.1, verbose: bool = False, eps: float = 1e-300) → KLCalibrationResult[source]¶

Calibrate using KL divergence with Dykstra’s alternating projections.

Projects multiclass probabilities onto the intersection of:

row simplex: {rows ≥ 0, rows sum to 1}
column-wise isotone-by-anchor + fixed column sums

Minimizes KL(Q||R) subject to constraints, where:

R is the reference distribution for KL divergence (default: P)
A is the anchor for rank ordering (default: P)

This anchor-reference decoupling allows separating “what we measure divergence from” (R) from “what determines rank order” (A).

Parameters:

P – Input probability matrix of shape (N, J).
M – Target column sums of shape (J,).
R – Reference distribution for KL divergence. If None, uses P.
A – Anchor for rank ordering. If None, uses P.
max_iters – Maximum number of iterations.
tol – Convergence tolerance for relative change.
feasibility_tol – Tolerance for feasibility warnings.
verbose – If True, enables debug logging.
eps – Small constant for numerical stability.

Returns:

KLCalibrationResult with calibrated matrix and diagnostics.

Raises:

CalibrationError – If inputs are invalid or algorithm fails to converge.

Examples

Basic KL calibration:

>>> import numpy as np
>>> from rank_preserving_calibration import calibrate_kl
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> M = np.array([1.0, 0.7, 0.3])
>>> result = calibrate_kl(P, M)

With anchor-reference decoupling:

>>> # Use P for KL reference, but A for rank ordering
>>> A = np.array([[0.6, 0.3, 0.1], [0.4, 0.4, 0.2]])
>>> result = calibrate_kl(P, M, R=P, A=A)

rank_preserving_calibration.calibrate_kl_pareto(P: ndarray, M: ndarray, lambda_grid: list[float] | ndarray | None = None, R: ndarray | None = None, A: ndarray | None = None, max_iters: int = 500, tol: float = 1e-05, verbose: bool = False, eps: float = 1e-300) → KLParetoResult[source]¶

Compute Pareto frontier of KL divergence vs rank violation.

Sweeps over λ values with warm starting, reporting the full Pareto frontier rather than a single tuned point.

Parameters:

P – Input probability matrix of shape (N, J).
M – Target column sums of shape (J,).
lambda_grid – Grid of λ values to sweep. Default: geometric grid.
R – Reference for KL divergence. Default: P.
A – Anchor for rank ordering. Default: P.
max_iters – Max iterations per λ value.
tol – Convergence tolerance.
verbose – Enable debug logging.
eps – Numerical stability constant.

Returns:

KLParetoResult with solutions along the frontier.

Examples

>>> result = calibrate_kl_pareto(P, M)
>>> for lam, Q in result.solutions:
...     print(f"λ={lam:.2f}: KL={result.kl_values[i]:.4f}")

rank_preserving_calibration.calibrate_kl_soft(P: ndarray, M: ndarray, lam: float = 1.0, R: ndarray | None = None, A: ndarray | None = None, max_iters: int = 1000, tol: float = 1e-06, verbose: bool = False, eps: float = 1e-300) → KLCalibrationResult[source]¶

Soft KL calibration with λ-weighted rank penalty.

Solves:

min KL(Q||R) + λ·V_A(Q) s.t. Q ∈ C(M)

Where:

KL(Q||R) is the KL divergence from reference R
V_A(Q) is the rank violation penalty using anchor A
C(M) is row simplex + column sums = M

Parameters:

P – Input probability matrix of shape (N, J).
M – Target column sums of shape (J,).
lam – Rank penalty weight. Larger = more isotonic.
R – Reference for KL divergence. Default: P.
A – Anchor for rank ordering. Default: P.
max_iters – Maximum iterations.
tol – Convergence tolerance.
verbose – Enable debug logging.
eps – Numerical stability constant.

Returns:

KLCalibrationResult with calibrated matrix.

Examples

>>> result = calibrate_kl_soft(P, M, lam=10.0)  # Strong rank enforcement
>>> result = calibrate_kl_soft(P, M, lam=0.1)  # Weak rank enforcement

rank_preserving_calibration.calibrate_ovr_isotonic(y: ndarray, probs: ndarray) → dict[str, Any][source]¶

Calibrates multiclass probabilities using One-vs-Rest Isotonic Regression.

For each class, this method trains a separate isotonic regression model on the binary problem of that class vs. all other classes. The resulting calibrated probabilities are then normalized to sum to 1. This is a common approach for multiclass calibration and is used by libraries like scikit-learn.

Parameters:

y – True class labels as integers of shape (N,).
probs – Original probability matrix of shape (N, J).

Returns:

A dictionary containing the calibrated probabilities ‘Q’.

rank_preserving_calibration.calibrate_soft(P: ndarray, M: ndarray, lam_m: float = 1.0, lam_r: float = 10.0, max_iters: int = 1000, tol: float = 1e-06, verbose: bool = False, step_size: float = 0.1) → SoftCalibrationResult[source]¶

Calibrate with soft constraints on marginals and ranks.

Solves the optimization problem:: min_Q ||Q - P||² + lam_m·||col_sums(Q) - M||² + lam_r·rank_penalty(Q) s.t. Q on row simplex (rows >= 0, sum to 1)

This allows trading off between: - Staying close to original predictions (small lam_m, lam_r) - Matching marginals (large lam_m) - Preserving ranks (large lam_r)

Parameters:

P – Input probability matrix of shape (N, J).
M – Target column sums of shape (J,). Should sum to approximately N.
lam_m – Marginal penalty weight. Larger = closer to target marginals. Set to 0 to ignore marginals entirely.
lam_r – Rank penalty weight. Larger = more isotonic. Set to 0 to allow arbitrary rank violations.
max_iters – Maximum number of iterations.
tol – Convergence tolerance for relative change.
verbose – If True, enables debug logging.
step_size – Step size for gradient descent updates.

Returns:

SoftCalibrationResult with calibrated matrix and diagnostics.

Raises:

CalibrationError – If inputs are invalid.

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import calibrate_soft
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> M = np.array([1.0, 0.7, 0.3])
>>> # Balanced trade-off
>>> result = calibrate_soft(P, M, lam_m=1.0, lam_r=10.0)
>>> # Prioritize marginal matching
>>> result = calibrate_soft(P, M, lam_m=100.0, lam_r=1.0)
>>> # Prioritize rank preservation
>>> result = calibrate_soft(P, M, lam_m=0.1, lam_r=100.0)

rank_preserving_calibration.calibrate_soft_admm(P: ndarray, M: ndarray, lam_m: float = 1.0, lam_r: float = 10.0, rho: float = 1.0, max_iters: int = 1000, tol: float = 1e-06, verbose: bool = False) → SoftCalibrationResult[source]¶

ADMM-based soft calibration with better convergence.

Uses ADMM to solve the soft-constraint problem more reliably than simple gradient descent.

Parameters:

P – Input probability matrix of shape (N, J).
M – Target column sums of shape (J,).
lam_m – Marginal penalty weight.
lam_r – Rank penalty weight.
rho – ADMM penalty parameter.
max_iters – Maximum iterations.
tol – Convergence tolerance.
verbose – Enable debug logging.

Returns:

SoftCalibrationResult with calibrated matrix.

rank_preserving_calibration.calibrate_two_stage(P: ndarray, M: ndarray, ipf_max_iters: int = 100, ipf_tol: float = 1e-08, proj_max_iters: int = 100, proj_tol: float = 1e-08, preserve_marginals: bool = False, verbose: bool = False) → TwoStageResult[source]¶

Two-stage calibration: IPF followed by isotonic projection.

This approach first uses IPF (raking) to approximately match target marginals while preserving the relative probability structure, then applies isotonic projection to restore rank ordering.

The two-stage approach can produce less “flat” solutions compared to direct Dykstra projection when distribution shifts are large, because IPF preserves probability ratios rather than doing Euclidean projection.

Parameters:

P – Input probability matrix of shape (N, J).
M – Target column sums of shape (J,). Should sum to N.
ipf_max_iters – Maximum IPF iterations in stage 1.
ipf_tol – Convergence tolerance for IPF.
proj_max_iters – Maximum iterations for isotonic projection in stage 2.
proj_tol – Convergence tolerance for projection.
preserve_marginals – If True, re-apply column scaling after isotonic projection to maintain marginals (may re-introduce rank violations).
verbose – If True, enables debug logging.

Returns:

TwoStageResult with calibrated matrix and diagnostics.

Raises:

CalibrationError – If inputs are invalid.

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import calibrate_two_stage
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]])
>>> M = np.array([1.0, 1.2, 0.8])
>>> result = calibrate_two_stage(P, M)
>>> print(f"Converged: {result.converged}")

rank_preserving_calibration.classwise_ece(y: ndarray, probs: ndarray, n_bins: int = 15, balanced: bool = False) → dict[source]¶

Compute class-wise Expected Calibration Error.

For each class, bins predictions by the predicted probability for that class and measures the gap between predicted probability and actual frequency. Aggregates across all classes.

Parameters:

y – True class labels as integers of shape (N,).
probs – Probability matrix of shape (N, J).
n_bins – Number of probability bins per class (default 15).
balanced – If True, weight classes equally; otherwise weight by sample count.

Returns:

“ece”: Aggregated Expected Calibration Error
”mce”: Maximum Calibration Error across all classes
”per_class”: List of per-class ECE/MCE metrics and bin tables

Return type:

Dictionary containing

Examples

>>> import numpy as np
>>> y = np.array([0, 0, 1, 1, 2, 2])
>>> probs = np.array([[0.8, 0.1, 0.1], [0.7, 0.2, 0.1],
...                   [0.2, 0.6, 0.2], [0.1, 0.8, 0.1],
...                   [0.1, 0.2, 0.7], [0.1, 0.1, 0.8]])
>>> result = classwise_ece(y, probs)
>>> print(f"Classwise ECE: {result['ece']:.4f}")

rank_preserving_calibration.column_variance(Q: ndarray) → dict[str, Any][source]¶

Compute per-column variance of a probability matrix.

Useful for measuring how “spread out” the calibrated probabilities are within each class. Lower variance may indicate a “flatter” solution.

Parameters:

Q – Probability matrix of shape (N, J).

Returns:

“per_column”: List of variance values for each column
”mean”: Mean variance across columns
”min”: Minimum column variance
”max”: Maximum column variance

Return type:

Dictionary containing

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import column_variance
>>> Q = np.array([[0.6, 0.3, 0.1], [0.4, 0.4, 0.2]])
>>> var = column_variance(Q)
>>> print(f"Mean column variance: {var['mean']:.4f}")

rank_preserving_calibration.compare_calibration_methods(P: ndarray, M: ndarray, results: dict[str, ndarray]) → dict[str, Any][source]¶

Compare multiple calibration results on flatness and constraint satisfaction.

Useful for deciding which calibration approach is best for a given problem by comparing solutions from different methods (Dykstra, ADMM, soft, two-stage).

Parameters:

P – Original probability matrix.
M – Target column sums.
results – Dictionary mapping method names to calibrated Q matrices.

Returns:

Dictionary with per-method metrics and rankings.

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import (
...     calibrate_dykstra, calibrate_soft, compare_calibration_methods
... )
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> M = np.array([0.9, 0.8, 0.3])
>>> Q_dykstra = calibrate_dykstra(P, M).Q
>>> Q_soft = calibrate_soft(P, M).Q
>>> comparison = compare_calibration_methods(P, M, {
...     "dykstra": Q_dykstra, "soft": Q_soft
... })

rank_preserving_calibration.distance_metrics(Q: ndarray, P: ndarray) → dict[str, float][source]¶

Compute distance metrics between original and calibrated probabilities.

Measures how much the calibration procedure has changed the probability estimates using various matrix norms and distance measures.

Parameters:

Q – Calibrated probability matrix of shape (N, J).
P – Original probability matrix of shape (N, J).

Returns:

“frobenius”: Frobenius norm ||Q - P||_F
”frobenius_sq”: Squared Frobenius norm ||Q - P||²_F
”mean_abs”: Mean absolute difference
”max_abs”: Maximum absolute difference

Return type:

Dictionary containing distance metrics

Examples

>>> import numpy as np
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.65, 0.25, 0.1], [0.35, 0.45, 0.2]])
>>> distances = distance_metrics(Q, P)
>>> print(f"Frobenius distance: {distances['frobenius']:.4f}")

rank_preserving_calibration.feasibility_metrics(Q: ndarray, M: ndarray | None = None) → dict[str, Any][source]¶

Compute feasibility metrics for calibrated probability matrix.

Analyzes how well the probability matrix Q satisfies the row and column sum constraints required by rank-preserving calibration.

Parameters:

Q – Calibrated probability matrix of shape (N, J). Should have rows summing to 1 and non-negative entries.
M – Target column sums of shape (J,). If provided, analyzes column sum constraint satisfaction.

Returns:

“row”: Row constraint metrics including max/mean absolute errors and min value
”col”: Column constraint metrics (if M provided) including various error norms

Return type:

Dictionary containing feasibility metrics

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import feasibility_metrics
>>> Q = np.array([[0.6, 0.3, 0.1], [0.4, 0.4, 0.2]])
>>> M = np.array([1.0, 0.7, 0.3])
>>> metrics = feasibility_metrics(Q, M)
>>> print(f"Max row error: {metrics['row']['max_abs_error']:.6f}")
>>> print(f"Max col error: {metrics['col']['max_abs_error']:.6f}")

rank_preserving_calibration.flatness_bound(P: ndarray, M: ndarray) → dict[str, Any][source]¶

Compute theoretical bound on expected solution flatness.

This function estimates how flat the calibrated solution is likely to be based on the distribution shift magnitude. When the shift is large relative to the variance in P, we expect Q to be flat.

The bound is based on the observation that calibration seeks Q closest to P subject to constraints. When constraints force Q far from P, the solution tends toward the uniform point Q_ij = M_j/N which satisfies all column constraints by construction.

Parameters:

P – Original probability matrix of shape (N, J).
M – Target column sums of shape (J,).

Returns:

“shift_magnitude”: Normalized L2 shift
”variance_p”: Total variance of P
”expected_variance_ratio”: Estimated Var(Q)/Var(P)
”flatness_risk”: Score from 0 to 1 indicating risk of flat solution
”recommendation”: String with practical advice

Return type:

Dictionary containing

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import flatness_bound
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> M = np.array([0.5, 0.8, 0.7])  # Large shift
>>> bound = flatness_bound(P, M)
>>> print(f"Flatness risk: {bound['flatness_risk']:.2f}")

rank_preserving_calibration.flatness_metrics(Q: ndarray, P: ndarray | None = None, M: ndarray | None = None) → dict[str, Any][source]¶

Measure how “flat” (uninformative) the calibrated solution is.

A perfectly flat solution has equal values within each column when sorted by original scores, meaning all samples receive the same probability for each class. This indicates loss of discriminative information.

Parameters:

Q – Calibrated probability matrix of shape (N, J).
P – Original probability matrix of shape (N, J). If provided, enables variance ratio computation.
M – Target column sums of shape (J,). If provided, enables comparison to the uniform solution Q_ij = M_j/N.

Returns:

“column_variance”: Per-column variance of Q values
”mean_column_variance”: Average variance across columns
”total_variance”: Total variance of all Q entries
”entropy_per_row”: Mean entropy of each row
”mean_max_prob”: Mean of maximum probability in each row

If P is provided:

”variance_ratio”: Var(Q) / Var(P), lower means flatter

If M is provided:

”distance_to_uniform”: Frobenius distance from uniform solution

Return type:

Dictionary containing flatness metrics

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import flatness_metrics
>>> Q = np.array([[0.5, 0.3, 0.2], [0.4, 0.4, 0.2]])
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> metrics = flatness_metrics(Q, P)
>>> print(f"Variance ratio: {metrics['variance_ratio']:.3f}")

rank_preserving_calibration.informativeness_ratio(Q: ndarray, P: ndarray) → dict[str, Any][source]¶

Compare variance of Q vs P to measure information preservation.

Parameters:

Q – Calibrated probability matrix of shape (N, J).
P – Original probability matrix of shape (N, J).

Returns:

“total_ratio”: Var(Q) / Var(P)
”per_column_ratio”: Variance ratio for each column
”mean_column_ratio”: Mean of per-column ratios
”interpretation”: String describing the result

Return type:

Dictionary containing

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import informativeness_ratio
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.5, 0.35, 0.15], [0.5, 0.35, 0.15]])  # Flat
>>> ratio = informativeness_ratio(Q, P)
>>> print(f"Total ratio: {ratio['total_ratio']:.3f}")

rank_preserving_calibration.isotonic_metrics(Q: ndarray, P: ndarray) → dict[str, Any][source]¶

Analyze isotonic (rank-preserving) properties of calibrated probabilities.

Verifies that calibrated probabilities are nondecreasing when ordered by original model scores within each class. Computes violation statistics and flatness measures.

Parameters:

Q – Calibrated probability matrix of shape (N, J).
P – Original probability matrix of shape (N, J) used for score ordering.

Returns:

“max_rank_violation”: Maximum rank-order violation across all columns
”mass_weighted_violation”: Sum of violation magnitudes weighted by mass
”flat_fraction”: Fraction of adjacent pairs that are equal (flatness)
”per_class_max_violation”: List of max violations per class
”per_class_mass_violation”: List of mass-weighted violations per class

Return type:

Dictionary containing isotonic metrics

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import isotonic_metrics
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.65, 0.25, 0.1], [0.35, 0.45, 0.2]])
>>> metrics = isotonic_metrics(Q, P)
>>> print(f"Max violation: {metrics['max_rank_violation']:.6f}")

rank_preserving_calibration.kl_divergence(Q: ndarray, R: ndarray, eps: float = 1e-15) → float[source]¶

Compute KL divergence KL(Q||R) = sum Q * log(Q/R).

The KL divergence measures how much Q differs from R, where R is the reference distribution. Uses convention 0 * log(0/x) = 0.

Parameters:

Q – Probability matrix of shape (N, J).
R – Reference probability matrix of shape (N, J).
eps – Small constant for numerical stability.

Returns:

KL divergence value (non-negative).

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import kl_divergence
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.6, 0.3, 0.1], [0.4, 0.4, 0.2]])
>>> kl = kl_divergence(Q, P)
>>> print(f"KL divergence: {kl:.4f}")

rank_preserving_calibration.marginal_shift_metrics(P: ndarray, M: ndarray) → dict[str, Any][source]¶

Quantify the distribution shift between P’s empirical marginals and M.

Larger shifts generally require more aggressive calibration adjustments, which can lead to flatter solutions. This function helps diagnose whether flatness is expected given the shift magnitude.

Parameters:

P – Original probability matrix of shape (N, J).
M – Target column sums of shape (J,).

Returns:

“empirical_marginals”: Column sums of P
”target_marginals”: M values
”marginal_diff”: Difference (M - empirical)
”l1_shift”: L1 norm of the difference
”l2_shift”: L2 norm of the difference
”max_shift”: Maximum absolute shift in any column
”relative_l2_shift”: L2 shift divided by sqrt(N)
”per_column_relative_shift”: Relative shift per column

Return type:

Dictionary containing shift metrics

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import marginal_shift_metrics
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> M = np.array([0.8, 0.8, 0.4])
>>> metrics = marginal_shift_metrics(P, M)
>>> print(f"L2 shift: {metrics['l2_shift']:.3f}")

rank_preserving_calibration.nll(y: ndarray, probs: ndarray) → float[source]¶

Compute negative log-likelihood (cross-entropy loss).

Parameters:

y – True class labels as integers of shape (N,).
probs – Probability matrix of shape (N, J).

Returns:

Average negative log-likelihood across all instances.

Examples

>>> import numpy as np
>>> y = np.array([0, 1, 2])
>>> probs = np.array([[0.8, 0.1, 0.1], [0.2, 0.7, 0.1], [0.1, 0.2, 0.7]])
>>> loss = nll(y, probs)
>>> print(f"NLL: {loss:.4f}")

rank_preserving_calibration.project_near_isotonic_euclidean(v: ndarray, eps: float, sum_target: float | None = None, weights: ndarray | None = None) → ndarray[source]¶

Project v onto the set { z : z_{i+1} >= z_i - eps } in L2.

Parameters:

v ((n,) array_like) – Input vector to project.
eps (float) – Slack parameter (>= 0). Allows z[i+1] >= z[i] - eps instead of strict z[i+1] >= z[i].
sum_target (float, optional) – If provided, perform an exact uniform shift so the result sums to this value.
weights ((n,) array_like, optional) – Positive weights for a weighted L2 projection.

Returns:

z – Projected vector satisfying near-isotonic constraint (and sum, if requested).

Return type:

(n,) ndarray

Notes

Idempotent: applying the projection twice gives the same result.

rank_preserving_calibration.project_near_kl_isotonic(v: ndarray, eps_slack: float, sum_target: float | None = None, weights: ndarray | None = None, eps: float = 1e-300) → ndarray[source]¶

Project v onto near-isotonic set with multiplicative slack under KL.

The constraint is: z[i+1] >= z[i] * (1 - eps_slack)

This is the KL analog of the Euclidean nearly-isotonic projection. The key insight is to transform to log space:

log(z[i+1]) >= log(z[i]) + log(1 - eps_slack)

Let delta = -log(1 - eps_slack) > 0. Then:: log(z[i+1]) >= log(z[i]) - delta

This is equivalent to the additive slack problem in log space.

Parameters:

v – Input vector (positive values).
eps_slack – Multiplicative slack parameter (0 to 1). eps_slack=0 is strict isotonic, eps_slack=0.1 allows 10% violation.
sum_target – If provided, apply multiplicative rescaling to hit this sum.
weights – Optional weights for weighted projection.
eps – Numerical stability constant.

Returns:

Near-isotonic projection in KL sense.

rank_preserving_calibration.prox_kl_near_isotonic(y: ndarray, lam: float, rho: float = 1.0, max_iters: int = 1000, tol: float = 1e-08, eps: float = 1e-300, return_info: bool = False) → ndarray | tuple[ndarray, dict][source]¶

Proximal operator for λ-penalty nearly-isotonic term under KL.

Solves:: min_z KL(z||y) + λ * sum_i (z_i - z_{i+1})_+

where (x)_+ = max(0, x) penalizes rank violations.

Uses ADMM in the dual space. This is more experimental than the epsilon-slack approach.

Parameters:

y – Input vector (positive values).
lam – Penalty weight for rank violations.
rho – ADMM penalty parameter.
max_iters – Maximum ADMM iterations.
tol – Convergence tolerance.
eps – Numerical stability constant.
return_info – If True, also return convergence info dict.

Returns:

z (ndarray) If return_info=True: (z, info_dict)

Return type:

If return_info=False

rank_preserving_calibration.prox_near_isotonic(y: ndarray, lam: float, *, rho: float = 1.0, max_iters: int = 2000, abstol: float = 1e-08, reltol: float = 1e-06, return_info: bool = False) → ndarray | tuple[ndarray, dict][source]¶

Exact proximal operator for the λ-penalty nearly-isotonic term:: prox_{λ R}(y), R(z) = ∑_{i=1}^{n-1} (z_i - z_{i+1})_+,

where (x)_+ = max(0, x). This penalizes downward steps, relaxing strict isotonicity.

We solve:

minimize_z 0.5 * ||z - y||_2^2 + λ * ∑ (Dz)_+ with (Dz)_i = z_i - z_{i+1}

by ADMM on the split Dz = r:

z-update: (I + ρ D^T D) z = y + ρ D^T (r - u) [SPD tridiagonal solve] r-update: r = prox_{(λ/ρ)‖·‖_{+,1}}(Dz + u) [one-sided soft-threshold] u-update: u ← u + Dz - r

One-sided soft-threshold (elementwise):

prox_{t(·)_+}(v) = { v - t, if v > t: 0, if 0 ≤ v ≤ t v, if v < 0 }

rank_preserving_calibration.prox_near_isotonic_with_sum(y: ndarray, lam: float, sum_target: float, *, return_info: bool = False, **kwargs) → ndarray | tuple[ndarray, dict][source]¶

Prox with a sum constraint:: minimize_z 0.5 * ||z - y||_2^2 + λ ∑ (z_i - z_{i+1})_+ subject to 1^T z = sum_target.
Because R(z) depends only on differences, it is translation-invariant:: R(z + c·1) = R(z).
Hence the constrained prox is obtained exactly by:: z* = prox_{λR}(y) + ((sum_target - 1^T prox_{λR}(y)) / n) · 1

Returns:

If return_info = False (default) (z (ndarray)*)
If return_info = True ((z, info_dict)*)

rank_preserving_calibration.reverse_kl_divergence(Q: ndarray, R: ndarray, eps: float = 1e-15) → float[source]¶

Compute reverse KL divergence KL(R||Q) = sum R * log(R/Q).

The reverse KL divergence has different properties than forward KL: - Forward KL(Q||R) is mean-seeking (Q spreads over modes of R) - Reverse KL(R||Q) is mode-seeking (Q focuses on main mode of R)

Parameters:

Q – Probability matrix of shape (N, J).
R – Reference probability matrix of shape (N, J).
eps – Small constant for numerical stability.

Returns:

Reverse KL divergence value (non-negative).

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import reverse_kl_divergence
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.6, 0.3, 0.1], [0.4, 0.4, 0.2]])
>>> rkl = reverse_kl_divergence(Q, P)

rank_preserving_calibration.sharpness_metrics(probs: ndarray) → dict[str, float][source]¶

Compute sharpness metrics for probability predictions.

Sharpness measures how confident (peaked) the predictions are, independent of whether they are correct. Sharper predictions are more informative.

Parameters:

probs – Probability matrix of shape (N, J).

Returns:

“mean_entropy”: Average entropy across predictions (lower = sharper)
”mean_max_prob”: Average of maximum probabilities (higher = sharper)
”mean_margin”: Average difference between top two probabilities

Return type:

Dictionary containing

Examples

>>> import numpy as np
>>> probs = np.array([[0.9, 0.05, 0.05], [0.5, 0.3, 0.2]])
>>> sharp = sharpness_metrics(probs)
>>> print(f"Mean max prob: {sharp['mean_max_prob']:.3f}")

rank_preserving_calibration.symmetrized_kl(Q: ndarray, R: ndarray, eps: float = 1e-15) → float[source]¶

Compute symmetrized KL divergence (Jensen-Shannon related).

Symmetrized KL = 0.5 * (KL(Q||R) + KL(R||Q))

This is symmetric in Q and R, unlike the standard KL divergence.

Parameters:

Q – Probability matrix of shape (N, J).
R – Reference probability matrix of shape (N, J).
eps – Small constant for numerical stability.

Returns:

Symmetrized KL divergence value (non-negative).

Examples

>>> import numpy as np
>>> from rank_preserving_calibration import symmetrized_kl
>>> P = np.array([[0.7, 0.2, 0.1], [0.3, 0.5, 0.2]])
>>> Q = np.array([[0.6, 0.3, 0.1], [0.4, 0.4, 0.2]])
>>> skl = symmetrized_kl(Q, P)

rank_preserving_calibration.tie_group_variance(Q: ndarray, P: ndarray) → dict[str, Any][source]¶: Within-equal-score group variance of Q; for ties=’group’ this should be ~0.

rank_preserving_calibration.top_label_ece(y: ndarray, probs: ndarray, n_bins: int = 15) → dict[source]¶

Compute Expected Calibration Error for top-label predictions.

Bins predictions by confidence (max probability) and measures the gap between confidence and accuracy in each bin. ECE is the weighted average of these gaps, and MCE is the maximum gap.

Parameters:

y – True class labels as integers of shape (N,).
probs – Probability matrix of shape (N, J).
n_bins – Number of confidence bins (default 15).

Returns:

“ece”: Expected Calibration Error (weighted average gap)
”mce”: Maximum Calibration Error (max gap across bins)
”bins”: List of (count, mean_confidence, accuracy) per bin

Return type:

Dictionary containing

Examples

>>> import numpy as np
>>> y = np.array([0, 1, 2])
>>> probs = np.array([[0.8, 0.1, 0.1], [0.2, 0.7, 0.1], [0.1, 0.2, 0.7]])
>>> ece_result = top_label_ece(y, probs)
>>> print(f"ECE: {ece_result['ece']:.4f}")