Mathematical Theory¶
This section explains the mathematical foundations of rank-preserving calibration.
Problem Formulation¶
Given a probability matrix \(P \in \mathbb{R}^{N \times J}\) where each row represents a probability distribution over \(J\) classes for \(N\) samples, we want to find a calibrated matrix \(Q\) that satisfies:
Row constraints: Each row is a valid probability distribution
\[\sum_{j=1}^J Q_{ij} = 1, \quad Q_{ij} \geq 0 \quad \forall i,j\]Column constraints: Column sums match target marginals
\[\sum_{i=1}^N Q_{ij} = M_j \quad \forall j\]Isotonic constraints: Within each column, values are non-decreasing when sorted by original scores
\[Q_{i_1,j} \leq Q_{i_2,j} \text{ if } P_{i_1,j} \leq P_{i_2,j}\]
The optimization problem is:
Algorithmic Approaches¶
Dykstra’s Alternating Projections¶
Dykstra’s method alternates between projecting onto different constraint sets while maintaining memory terms to ensure convergence to the intersection.
Algorithm:
Initialize \(Q^{(0)} = P\), \(U^{(0)} = 0\), \(V^{(0)} = 0\)
For \(k = 0, 1, 2, \ldots\):
a. Row projection: Project each row onto the probability simplex
\[\tilde{Q}^{(k+1)} = \text{proj}_{\text{simplex}}(Q^{(k)} - U^{(k)})\]Update memory: \(U^{(k+1)} = \tilde{Q}^{(k+1)} - (Q^{(k)} - U^{(k)})\)
b. Column projection: Project each column to satisfy isotonic and sum constraints
\[Q^{(k+1)} = \text{proj}_{\text{isotonic}}(\tilde{Q}^{(k+1)} - V^{(k)})\]Update memory: \(V^{(k+1)} = Q^{(k+1)} - (\tilde{Q}^{(k+1)} - V^{(k)})\)
Check convergence: \(\|Q^{(k+1)} - Q^{(k)}\|_F < \text{tol}\)
Simplex Projection:
For a vector \(x \in \mathbb{R}^J\), the projection onto the probability simplex is:
where \(\lambda\) is chosen so that \(\sum_j \max(0, x_j - \lambda) = 1\).
Isotonic Projection:
Uses the Pool Adjacent Violators (PAV) algorithm to find the isotonic regression that minimizes squared distance while satisfying the sum constraint.
ADMM Optimization¶
The Alternating Direction Method of Multipliers (ADMM) formulates the problem with auxiliary variables and Lagrange multipliers.
Augmented Lagrangian:
where \(A\) encodes the linear constraints and \(b\) the target values.
Algorithm:
\(Q\)-update: Solve the quadratic subproblem
\(Z\)-update: Apply isotonic projections
\(\Lambda\)-update: Dual variable update
Nearly Isotonic Calibration¶
Strict isotonic constraints can be overly restrictive. We provide two relaxation approaches:
Epsilon-Slack Approach¶
Allow violations up to \(\epsilon\):
This maintains convexity and uses Euclidean projection onto the relaxed constraint set.
Lambda-Penalty Approach¶
Add a penalty term for violations:
where the sum is over pairs with \(P_{i_1,j} \leq P_{i_2,j}\).
Convergence Properties¶
Dykstra’s Method:
Guaranteed convergence to the intersection of constraint sets
Linear convergence rate under regularity conditions
Each iteration maintains feasibility of the most recent constraint
ADMM:
Convergence under standard assumptions (constraint qualification)
Convergence rate depends on penalty parameter \(\rho\)
Provides primal and dual residual tracking
Nearly Isotonic:
Epsilon-slack: Maintains theoretical guarantees of Dykstra’s method
Lambda-penalty: Convergence depends on penalty parameter choice
Computational Complexity¶
Per Iteration:
Row projections: \(O(NJ \log J)\) using efficient simplex projection
Column projections: \(O(NJ)\) using PAV algorithm
Overall: \(O(NJ \log J)\) per iteration
Memory Requirements:
\(O(NJ)\) for probability matrices
Dykstra’s method: Additional \(O(NJ)\) for memory terms
ADMM: Additional storage for auxiliary variables and dual variables
Numerical Considerations¶
Stability:
All computations use double precision (float64)
Isotonic regression includes numerical safeguards
Input validation checks for NaN/infinite values
Feasibility:
The intersection of constraints may be empty if \(\sum_j M_j \neq N\)
Warnings are issued when \(|\sum_j M_j - N|\) is large
Nearly isotonic approaches can help with infeasible problems