Core Functions

This module contains the main optimization functions that form the core of the library.

Main Optimization Functions

optimal_cutoffs.api.optimize_thresholds(y_true: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y_score: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], *, metric: str = 'f1', task: Task = Task.AUTO, average: Average = Average.AUTO, method: str = 'auto', mode: str = 'empirical', sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, **kwargs) OptimizationResult[source]

Find optimal thresholds for classification problems.

This is THE canonical entry point for threshold optimization. Auto-detects problem type and selects appropriate algorithms.

Parameters:
  • y_true – True labels

  • y_score – Predicted scores/probabilities - Binary: 1D array of scores - Multiclass: 2D array (n_samples, n_classes) - Multilabel: 2D array (n_samples, n_labels)

  • metric – Metric to optimize (“f1”, “precision”, “recall”, “accuracy”, etc.)

  • task – Problem type. AUTO infers from data shape and probability sums.

  • average – Averaging strategy for multiclass/multilabel. AUTO selects sensible default.

  • method – Optimization algorithm. AUTO selects best method per task+metric.

  • mode – “empirical” (standard) or “expected” (requires calibrated probabilities)

  • sample_weight – Sample weights

  • **kwargs – Additional keyword arguments passed to optimization algorithms. Common options include ‘comparison’ (“>”, “>=”), ‘tolerance’, ‘utility’.

Returns:

Result with .thresholds, .predict(), and explanation of auto-selections

Return type:

OptimizationResult

Raises:
  • TypeError – If ‘bayes’ is passed as a keyword argument (deprecated).

  • ValueError – If mode=’bayes’ requires utility parameter but none provided. If comparison operator is not ‘>’ or ‘>=’. If mode=’expected’ with unsupported metric. If method is deprecated (‘dinkelbach’, ‘smart_brute’). If unknown metric name is provided. If true_labels required for empirical mode but not provided.

Examples

>>> # Binary classification - simple case
>>> result = optimize_thresholds(y_true, y_scores, metric="f1")
>>> print(f"Optimal threshold: {result.threshold}")
>>> # Multiclass classification
>>> result = optimize_thresholds(y_true, y_probs, metric="f1")
>>> print(f"Per-class thresholds: {result.thresholds}")
>>> print(f"Task inferred as: {result.task.value}")
>>> # Explicit control when needed
>>> result = optimize_thresholds(
...     y_true, y_probs,
...     metric="precision",
...     task=Task.MULTICLASS,
...     average=Average.MACRO
... )
optimal_cutoffs.api.optimize_decisions(y_prob: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], cost_matrix: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], **kwargs) OptimizationResult[source]

Find optimal decisions using cost matrix (no thresholds).

For problems where thresholds aren’t the right abstraction. Uses Bayes-optimal decision rule: argmin_action E[cost | probabilities].

Parameters:
  • y_prob – Predicted probabilities (n_samples, n_classes)

  • cost_matrix – Cost matrix (n_classes, n_actions) or (n_classes, n_classes) cost_matrix[i, j] = cost of predicting action j when true class is i

  • **kwargs – Additional keyword arguments passed to the Bayes optimal decision function.

Returns:

Result with .predict() function (no .thresholds)

Return type:

OptimizationResult

Examples

>>> # Cost matrix: rows=true class, cols=predicted class
>>> costs = [[0, 1, 10], [5, 0, 1], [50, 10, 0]]  # FN costs 5x more than FP
>>> result = optimize_decisions(y_probs, costs)
>>> y_pred = result.predict(y_probs_test)

Binary Classification

optimal_cutoffs.binary.optimize_f1_binary(true_labels: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], pred_proba: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], *, beta: float = 1.0, sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, comparison: str = '>') OptimizationResult[source]

Optimize F-beta score for binary classification using sort-and-scan.

Uses the O(n log n) sort-and-scan algorithm exploiting the piecewise structure of F-beta metrics. This finds the exact optimal threshold.

Parameters:
  • true_labels – True binary labels in {0, 1}. Shape: (n_samples,)

  • pred_proba – Predicted probabilities for positive class in [0, 1]. Shape: (n_samples,)

  • beta – F-beta parameter. beta=1 gives F1 score

  • sample_weight – Sample weights. Shape: (n_samples,)

  • comparison – Comparison operator for threshold. Must be “>” or “>=”

Returns:

Result with optimal threshold, F-beta score, and predict function

Return type:

OptimizationResult

Examples

>>> y_true = [0, 1, 1, 0, 1]
>>> y_prob = [0.2, 0.8, 0.7, 0.3, 0.9]
>>> result = optimize_f1_binary(y_true, y_prob)
>>> result.threshold
0.5
>>> result.score  # F1 score at optimal threshold
0.8
optimal_cutoffs.binary.optimize_metric_binary(true_labels: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], pred_proba: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], *, metric: str = 'f1', method: str = 'auto', sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, comparison: str = '>', tolerance: float = 1e-10) OptimizationResult[source]

General binary metric optimization with automatic method selection.

Automatically selects the best optimization algorithm based on metric properties and data characteristics.

Parameters:
  • true_labels – True binary labels in {0, 1}. Shape: (n_samples,)

  • pred_proba – Predicted probabilities for positive class in [0, 1]. Shape: (n_samples,)

  • metric – Metric to optimize (“f1”, “precision”, “recall”, “accuracy”, etc.)

  • method – Optimization method: - “auto”: Automatically select best method - “sort_scan”: O(n log n) sort-and-scan (exact for piecewise metrics) - “minimize”: Scipy optimization - “gradient”: Simple gradient ascent

  • sample_weight – Sample weights. Shape: (n_samples,)

  • comparison – Comparison operator for threshold. Must be “>” or “>=”

  • tolerance – Numerical tolerance for optimization

Returns:

Result with optimal threshold, metric score, and predict function

Return type:

OptimizationResult

Raises:

ValueError – If method is unknown or not supported.

Examples

>>> result = optimize_metric_binary(y_true, y_prob, metric="precision")
>>> result = optimize_metric_binary(y_true, y_prob, metric="f1", method="sort_scan")
optimal_cutoffs.binary.optimize_utility_binary(true_labels: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None, pred_proba: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], *, utility: dict[str, float], sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None) OptimizationResult[source]

Optimize binary classification using utility/cost specification.

Computes the Bayes-optimal threshold using the closed-form formula: τ* = (u_tn - u_fp) / [(u_tp - u_fn) + (u_tn - u_fp)]

This is exact and runs in O(1) time.

Parameters:
  • true_labels – True binary labels. Can be None for pure Bayes optimization. Shape: (n_samples,)

  • pred_proba – Predicted probabilities for positive class in [0, 1]. Shape: (n_samples,)

  • utility – Utility specification with keys “tp”, “tn”, “fp”, “fn”

  • sample_weight – Sample weights (affects expected utility computation). Shape: (n_samples,)

Returns:

Result with optimal threshold, expected utility, and predict function

Return type:

OptimizationResult

Raises:

ValueError – If probabilities are not in the range [0, 1] for utility optimization.

Examples

>>> # FN costs 5x more than FP
>>> utility = {"tp": 10, "tn": 1, "fp": -1, "fn": -5}
>>> result = optimize_utility_binary(None, y_prob, utility=utility)
>>> result.threshold  # Closed-form optimal
0.167

Multiclass Classification

optimal_cutoffs.multiclass.optimize_multiclass(true_labels: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], pred_proba: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], *, metric: str = 'f1', average: str = 'macro', method: str = 'auto', sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, comparison: str = '>', tolerance: float = 1e-10) OptimizationResult[source]

General multiclass threshold optimization with automatic method selection.

Routes to appropriate algorithm based on averaging strategy and method:

  • Macro + auto/coord_ascent: Margin rule with coordinate ascent (single-label)

  • Macro + independent: Independent OvR optimization (can predict multiple)

  • Micro: Single threshold optimization (single-label)

Parameters:
  • true_labels (array-like of shape (n_samples,)) – True class labels in {0, 1, …, K-1}

  • pred_proba (array-like of shape (n_samples, n_classes)) – Predicted probabilities for each class

  • metric (str, default="f1") – Metric to optimize

  • average ({"macro", "micro"}, default="macro") – Averaging strategy

  • method ({"auto", "coord_ascent", "independent"}, default="auto") – Optimization method: - “auto”: For macro, uses coord_ascent (margin rule) - “coord_ascent”: Margin rule with coordinate ascent - “independent”: Independent per-class optimization (OvR)

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights

  • comparison (str, default=">") – Comparison operator

  • tolerance (float, default=1e-10) – Numerical tolerance

Returns:

Result with optimal thresholds and prediction function

Return type:

OptimizationResult

Examples

>>> # Margin rule (single-label, coordinate ascent)
>>> result = optimize_multiclass(y_true, y_prob, method="coord_ascent")
>>>
>>> # Independent optimization (can predict multiple classes)
>>> result = optimize_multiclass(y_true, y_prob, method="independent")
>>>
>>> # Micro averaging (single threshold)
>>> result = optimize_multiclass(y_true, y_prob, average="micro")
optimal_cutoffs.multiclass.optimize_ovr_independent(true_labels: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], pred_proba: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], *, metric: str = 'f1', method: str = 'auto', sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, comparison: str = '>', tolerance: float = 1e-10) OptimizationResult[source]

Optimize multiclass metrics using independent per-class thresholds (OvR).

Treats each class as an independent binary problem (class vs rest). This does NOT enforce single-label predictions - can predict 0, 1, or multiple classes. Use this for macro-averaged metrics when you want exact optimization per class.

Decision rule: ŷ_j = 1 if p_j ≥ τ_j (independent for each class)

Parameters:
  • true_labels (array-like of shape (n_samples,)) – True class labels in {0, 1, …, K-1}

  • pred_proba (array-like of shape (n_samples, n_classes)) – Predicted probabilities for each class

  • metric (str, default="f1") – Metric to optimize per class

  • method (str, default="auto") – Binary optimization method

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights

  • comparison (str, default=">") – Comparison operator

  • tolerance (float, default=1e-10) – Numerical tolerance

Returns:

Result with per-class thresholds optimized independently

Return type:

OptimizationResult

Examples

>>> y_true = [0, 1, 2, 0, 1]
>>> y_prob = [[0.7, 0.2, 0.1], [0.1, 0.8, 0.1], [0.1, 0.1, 0.8], ...]
>>> result = optimize_ovr_independent(y_true, y_prob, metric="f1")
>>> predictions = result.predict(y_prob)  # Can predict multiple classes
optimal_cutoffs.multiclass.optimize_ovr_margin(true_labels: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], pred_proba: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], *, metric: str = 'f1', max_iter: int = 30, sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, comparison: str = '>', tolerance: float = 1e-12) OptimizationResult[source]

Optimize multiclass metrics using margin rule with coordinate ascent.

Uses margin-based prediction: ŷ = argmax_j (p_j - τ_j) This ensures exactly one class is predicted per sample (single-label).

Thresholds are coupled because changing τ_j affects which samples are assigned to class j, which affects confusion matrices for all classes. Uses coordinate ascent to find local optimum.

Parameters:
  • true_labels (array-like of shape (n_samples,)) – True class labels in {0, 1, …, K-1}

  • pred_proba (array-like of shape (n_samples, n_classes)) – Predicted probabilities for each class

  • metric (str, default="f1") – Metric to optimize (currently supports “f1” only)

  • max_iter (int, default=30) – Maximum coordinate ascent iterations

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights

  • comparison (str, default=">") – Comparison operator (only “>” supported for margin rule)

  • tolerance (float, default=1e-12) – Convergence tolerance

Returns:

Result with per-class thresholds optimized via coordinate ascent

Return type:

OptimizationResult

Examples

>>> result = optimize_ovr_margin(y_true, y_prob, metric="f1")
>>> predictions = result.predict(y_prob)  # Exactly one class per sample

Notes

The margin rule is Bayes-optimal when costs have OvR structure: C(i,j) = -r_j if i=j, else c_j

In this case, optimal thresholds are: τ_j = c_j/(c_j + r_j) (closed form!)

optimal_cutoffs.multiclass.optimize_micro_multiclass(true_labels: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], pred_proba: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], *, metric: str = 'f1', method: str = 'auto', sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, comparison: str = '>', tolerance: float = 1e-10) OptimizationResult[source]

Optimize micro-averaged multiclass metrics using single threshold.

For micro averaging, we use a single threshold applied to all classes, then predict the class with highest valid probability. This reduces to a single binary optimization problem on flattened data.

Decision rule: ŷ = argmax{j: p_j ≥ τ} p_j (or argmax p_j if none valid)

Parameters:
  • true_labels (array-like of shape (n_samples,)) – True class labels in {0, 1, …, K-1}

  • pred_proba (array-like of shape (n_samples, n_classes)) – Predicted probabilities for each class

  • metric (str, default="f1") – Metric to optimize

  • method (str, default="auto") – Binary optimization method

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights

  • comparison (str, default=">") – Comparison operator

  • tolerance (float, default=1e-10) – Numerical tolerance

Returns:

Result with single threshold applied to all classes

Return type:

OptimizationResult

Examples

>>> result = optimize_micro_multiclass(y_true, y_prob, metric="f1")
>>> result.thresholds  # Same threshold for all classes
[0.3, 0.3, 0.3]

Multilabel Classification

optimal_cutoffs.multilabel.optimize_multilabel(true_labels: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], pred_proba: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], *, metric: str = 'f1', average: str = 'macro', method: str = 'auto', sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, comparison: str = '>', tolerance: float = 1e-10) OptimizationResult[source]

General multi-label threshold optimization with automatic method selection.

Routes to appropriate algorithm based on averaging strategy: - Macro: Independent optimization per label (exact, O(K·n log n)) - Micro: Coordinate ascent for coupled thresholds (local optimum)

Parameters:
  • true_labels (array-like of shape (n_samples, n_labels)) – True multi-label binary matrix

  • pred_proba (array-like of shape (n_samples, n_labels)) – Predicted probabilities for each label

  • metric (str, default="f1") – Metric to optimize

  • average ({"macro", "micro"}, default="macro") – Averaging strategy

  • method (str, default="auto") – Optimization method (passed to binary optimizer for macro)

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights

  • comparison (str, default=">") – Comparison operator

  • tolerance (float, default=1e-10) – Numerical tolerance

Returns:

Result with optimal thresholds and metric score

Return type:

OptimizationResult

Examples

>>> # Independent per-label optimization
>>> result = optimize_multilabel(y_true, y_prob, average="macro")
>>>
>>> # Coupled optimization for global metric
>>> result = optimize_multilabel(y_true, y_prob, average="micro")
optimal_cutoffs.multilabel.optimize_macro_multilabel(true_labels: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], pred_proba: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], *, metric: str = 'f1', method: str = 'auto', sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, comparison: str = '>', tolerance: float = 1e-10) OptimizationResult[source]

Optimize macro-averaged metrics for multi-label classification.

For macro averaging, each label is optimized independently: Macro-F1 = (1/K) Σ_j F1_j(τ_j)

Since each F1_j depends only on τ_j, we can optimize each threshold independently using binary optimization. This is exact and efficient.

Parameters:
  • true_labels (array-like of shape (n_samples, n_labels)) – True multi-label binary matrix

  • pred_proba (array-like of shape (n_samples, n_labels)) – Predicted probabilities for each label

  • metric (str, default="f1") – Metric to optimize per label (“f1”, “precision”, “recall”)

  • method (str, default="auto") – Binary optimization method for each label

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights

  • comparison (str, default=">") – Comparison operator

  • tolerance (float, default=1e-10) – Numerical tolerance

Returns:

Result with per-label thresholds and macro-averaged score

Return type:

OptimizationResult

Examples

>>> # 3 independent labels
>>> y_true = [[1, 0, 1], [0, 1, 0], [1, 1, 1]]
>>> y_prob = [[0.8, 0.2, 0.9], [0.1, 0.7, 0.3], [0.9, 0.8, 0.7]]
>>> result = optimize_macro_multilabel(y_true, y_prob, metric="f1")
>>> len(result.thresholds)  # One per label
3
optimal_cutoffs.multilabel.optimize_micro_multilabel(true_labels: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], pred_proba: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], *, metric: str = 'f1', max_iter: int = 30, sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, comparison: str = '>', tolerance: float = 1e-12) OptimizationResult[source]

Optimize micro-averaged metrics for multi-label classification.

For micro averaging, thresholds are coupled through global TP/FP/FN: Micro-F1 = 2·TP_total / (2·TP_total + FP_total + FN_total)

where TP_total = Σ_j TP_j(τ_j). Changing any τ_j affects the global metric, so we use coordinate ascent to optimize the coupled problem.

Parameters:
  • true_labels (array-like of shape (n_samples, n_labels)) – True multi-label binary matrix

  • pred_proba (array-like of shape (n_samples, n_labels)) – Predicted probabilities for each label

  • metric (str, default="f1") – Metric to optimize (“f1”, “precision”, “recall”)

  • max_iter (int, default=30) – Maximum coordinate ascent iterations

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights

  • comparison (str, default=">") – Comparison operator

  • tolerance (float, default=1e-12) – Convergence tolerance

Returns:

Result with per-label thresholds optimized for micro averaging

Return type:

OptimizationResult

Examples

>>> result = optimize_micro_multilabel(y_true, y_prob, metric="f1")
>>> # Thresholds are coupled - changing one affects global metric

Bayes-Optimal Decisions

optimal_cutoffs.bayes.threshold(cost_fp: float, cost_fn: float, *, prior: float | None = None) float[source]

Compute binary Bayes-optimal threshold from costs.

Parameters:
  • cost_fp (float) – Cost of false positive (predicting positive when actually negative)

  • cost_fn (float) – Cost of false negative (predicting negative when actually positive)

  • prior (float, optional) – Prior probability of positive class. If None, assumes 0.5.

Returns:

Optimal threshold

Return type:

float

Examples

>>> # FN costs 5x more than FP
>>> t = threshold(cost_fp=1.0, cost_fn=5.0)
>>> # Will be < 0.5 (more conservative, avoids costly false negatives)
optimal_cutoffs.bayes.thresholds_from_costs(fp_costs: ndarray[tuple[Any, ...], dtype[_ScalarT]] | list[float], fn_costs: ndarray[tuple[Any, ...], dtype[_ScalarT]] | list[float], **kwargs) ndarray[source]

Compute per-class Bayes-optimal thresholds from OvR costs.

Parameters:
  • fp_costs (array-like) – False positive costs per class

  • fn_costs (array-like) – False negative costs per class

Returns:

Per-class optimal thresholds

Return type:

np.ndarray

Examples

>>> # Different costs per class
>>> fp_costs = [1.0, 2.0, 0.5]  # Class 1 FP costs 2x more
>>> fn_costs = [5.0, 1.0, 10.0] # Class 2 FN costs 10x more
>>> thresholds = thresholds_from_costs(fp_costs, fn_costs)
optimal_cutoffs.bayes.policy(cost_matrix: ndarray[tuple[Any, ...], dtype[_ScalarT]]) OptimizationResult[source]

Create Bayes-optimal decision policy from cost matrix.

This is for general decision making where thresholds aren’t the right abstraction.

Parameters:

cost_matrix (array-like) – Cost matrix (n_classes, n_actions) cost_matrix[i, j] = cost of taking action j when true class is i

Returns:

Policy with .predict() method (no .thresholds)

Return type:

OptimizationResult

Examples

>>> costs = [[0, 1, 10], [5, 0, 1], [50, 10, 0]]
>>> policy = policy(costs)
>>> decisions = policy.predict(probabilities)

Internal Functions

These functions are used internally but may be useful for advanced users:

Optimized O(n log n) sort-and-scan kernel for piecewise-constant metrics.

This module provides an exact optimizer for binary classification metrics that are piecewise-constant with respect to the decision threshold. The algorithm sorts predictions once and scans all n cuts in a single pass, achieving true O(n log n) complexity with vectorized operations.

Notes on require_proba:
  • If require_proba=True, inputs are validated to lie in [0, 1].

  • The returned threshold is usually in [0, 1]; however, in boundary or tie cases, we may nudge it by one floating-point ULP beyond the range to correctly realize strict inclusivity/exclusivity (e.g., to ensure “predict none” with ‘>=’ when max p == 1.0).

optimal_cutoffs.piecewise.optimal_threshold_sortscan(y_true: ndarray[Any, Any], pred_prob: ndarray[Any, Any], metric: str | Callable[[ndarray[Any, Any], ndarray[Any, Any], ndarray[Any, Any], ndarray[Any, Any]], ndarray[Any, Any]], *, sample_weight: ndarray[Any, Any] | None = None, inclusive: bool = False, require_proba: bool = True, tolerance: float = 1e-10) OptimizationResult[source]

Exact optimizer for piecewise-constant metrics using O(n log n) sort-and-scan.

Parameters:
  • y_true (array-like of shape (n_samples,)) – Binary labels in {0, 1}.

  • pred_prob (array-like of shape (n_samples,)) – Predicted probabilities in [0, 1] or arbitrary scores if require_proba=False.

  • metric (str or callable) – Metric name (e.g., “f1”, “precision”) or vectorized function. If string, automatically resolves to vectorized implementation. If callable: (tp_vec, tn_vec, fp_vec, fn_vec) -> score_vec.

  • sample_weight (array-like, optional) – Non-negative sample weights of shape (n_samples,).

  • inclusive (bool, default=False) – If True, use “>=”; if False, use “>”.

  • require_proba (bool, default=True) – Validate inputs in [0, 1]. Threshold may be nudged by ±1 ULP outside [0,1] to exactly realize inclusivity/exclusivity in boundary/tie cases.

  • tolerance (float, default=1e-10) – Numerical tolerance for floating-point comparisons when computing threshold midpoints and handling ties between scores.

Returns:

thresholds : array([optimal_threshold]) scores : array([achieved_score]) predict : callable(probs) -> {0,1}^n metric : str, set to “piecewise_metric” n_classes : 2 diagnostics: dict with keys:

  • k_argmax: theoretical best cut index (0..n) from the sweep

  • k_realized: positives realized by the returned threshold

  • score_theoretical: score at k_argmax

  • score_actual: score achieved by the returned threshold

  • tie_discrepancy: abs(theoretical - actual)

  • inclusive: bool

  • require_proba: bool

Return type:

OptimizationResult

Unified threshold optimization for binary and multiclass classification.

This module consolidates all threshold optimization functionality into a single, streamlined interface. It includes high-performance Numba kernels, multiple optimization algorithms, and support for both binary and multiclass problems.

Key features: - Fast Numba kernels with Python fallbacks - Binary and multiclass threshold optimization - Multiple algorithms: sort-scan, scipy, gradient, coordinate ascent - Sample weight support (including in coordinate ascent) - Direct functional API without over-engineered abstractions

optimal_cutoffs.optimize.fast_f1_score(tp: float, tn: float, fp: float, fn: float) float[source]

Compute F1 score from confusion matrix.

optimal_cutoffs.optimize.compute_confusion_matrix_weighted(labels: ndarray, predictions: ndarray, weights: ndarray | None) tuple[float, float, float, float][source]

Compute weighted confusion matrix elements (serial, race-free).

optimal_cutoffs.optimize.sort_scan_kernel(labels: ndarray, scores: ndarray, weights: ndarray, inclusive: bool) tuple[float, float][source]

Python fallback for sort_scan_kernel.

Note: weights must be a valid array (use np.ones for uniform weights).

optimal_cutoffs.optimize.coordinate_ascent_kernel(y_true: ndarray, probs: ndarray, weights: ndarray | None, max_iter: int, tol: float) tuple[ndarray, float, ndarray][source]
optimal_cutoffs.optimize.optimize_sort_scan(labels: ndarray, scores: ndarray, metric: str, weights: ndarray | None = None, operator: str = '>=') OptimizationResult[source]

Sort-and-scan optimization for piecewise-constant metrics.

optimal_cutoffs.optimize.optimize_scipy(labels: ndarray, scores: ndarray, metric: str, weights: ndarray | None = None, operator: str = '>=', method: str = 'bounded', tol: float = 1e-06) OptimizationResult[source]

Scipy-based optimization for smooth metrics.

optimal_cutoffs.optimize.optimize_gradient(labels: ndarray, scores: ndarray, metric: str, weights: ndarray | None = None, operator: str = '>=', learning_rate: float = 0.01, max_iter: int = 100, tol: float = 1e-06) OptimizationResult[source]

Simple gradient ascent optimization (use for smooth metrics).

optimal_cutoffs.optimize.find_optimal_threshold_multiclass(true_labs: ndarray, pred_prob: ndarray, metric: str = 'f1', method: str = 'auto', average: str = 'macro', sample_weight: ndarray | None = None, comparison: str = '>', tolerance: float = 1e-10) OptimizationResult[source]

Find optimal per-class thresholds for multiclass classification.

optimal_cutoffs.optimize.find_optimal_threshold(labels: ndarray, scores: ndarray, metric: str = 'f1', weights: ndarray | None = None, strategy: str = 'auto', operator: str = '>=', require_probability: bool = True, tolerance: float = 1e-10) OptimizationResult[source]

Simple functional interface for binary threshold optimization.

optimal_cutoffs.optimize.get_performance_info() dict[str, Any][source]

Get information about performance optimizations available.