Optimal Classification Cutoffs
A Python library for computing optimal classification thresholds for binary and multiclass classification problems.
Features
Automatic detection of binary vs multiclass problems
Multiple optimization methods (brute force, scipy minimize, gradient ascent)
Support for custom metrics
Cross-validation utilities
Scikit-learn compatible API
One-vs-Rest strategy for multiclass problems
Installation
pip install optimal-classification-cutoffs
Quick Start
Binary Classification
from optimal_cutoffs import get_optimal_threshold
import numpy as np
# Binary classification example
y_true = np.array([0, 0, 1, 1])
y_prob = np.array([0.1, 0.4, 0.35, 0.8])
threshold = get_optimal_threshold(y_true, y_prob, metric='f1')
print(f"Optimal threshold: {threshold}")
Multiclass Classification
from optimal_cutoffs import get_optimal_threshold
import numpy as np
# Multiclass classification example
y_true = np.array([0, 1, 2, 0, 1, 2])
y_prob = np.array([
[0.7, 0.2, 0.1],
[0.1, 0.8, 0.1],
[0.1, 0.1, 0.8],
[0.6, 0.3, 0.1],
[0.2, 0.7, 0.1],
[0.1, 0.2, 0.7]
])
thresholds = get_optimal_threshold(y_true, y_prob, metric='f1')
print(f"Optimal thresholds per class: {thresholds}")
Using the Scikit-learn Interface
from optimal_cutoffs import ThresholdOptimizer
from sklearn.model_selection import train_test_split
# Initialize optimizer
optimizer = ThresholdOptimizer(metric='f1', method='smart_brute')
# Fit on training data
optimizer.fit(y_train, y_prob_train)
# Predict on test data
y_pred = optimizer.predict(y_prob_test)
Theory and Background
Understanding why standard optimization methods can fail for classification metrics:
API Reference
Core Functions
Threshold search strategies for optimizing classification metrics.
- optimal_cutoffs.optimizers.get_probability(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], objective: Literal['accuracy', 'f1'] = 'accuracy', verbose: bool = False) float [source]
Brute-force search for a simple metric’s best threshold.
Deprecated since version 1.0.0:
get_probability()
is deprecated and will be removed in a future version. Useget_optimal_threshold()
instead, which provides a unified API for both binary and multiclass classification with more optimization methods and additional features like sample weights.- Parameters:
true_labs – Array of true binary labels.
pred_prob – Predicted probabilities from a classifier.
objective – Metric to optimize. Supported values are
"accuracy"
and"f1"
.verbose – If
True
, print intermediate metric values during the search.
- Returns:
Threshold that maximizes the specified metric.
- Return type:
- optimal_cutoffs.optimizers.get_optimal_threshold(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'auto', sample_weight: ndarray | list[float] | list[int] | None = None, comparison: Literal['>', '>='] = '>') float | ndarray [source]
Find the threshold that optimizes a metric.
- Parameters:
true_labs – Array of true binary labels or multiclass labels (0, 1, 2, …, n_classes-1).
pred_prob – Predicted probabilities from a classifier. For binary: 1D array (n_samples,). For multiclass: 2D array (n_samples, n_classes).
metric – Name of a metric registered in
METRIC_REGISTRY
.method –
Strategy used for optimization: -
"auto"
: Automatically selects best method (default) -"sort_scan"
: O(n log n) algorithm for piecewise metrics withvectorized implementation
"smart_brute"
: Evaluates all unique probabilities"minimize"
: Usesscipy.optimize.minimize_scalar
"gradient"
: Simple gradient ascent"dinkelbach"
: Exact expected F-beta optimization (F1 only)
sample_weight – Optional array of sample weights for handling imbalanced datasets.
comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).
- Returns:
For binary: The threshold that maximizes the chosen metric. For multiclass: Array of per-class thresholds.
- Return type:
float | np.ndarray
- optimal_cutoffs.optimizers.get_optimal_multiclass_thresholds(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'auto', average: Literal['macro', 'micro', 'weighted', 'none'] = 'macro', sample_weight: ndarray | list[float] | list[int] | None = None, vectorized: bool = False, comparison: Literal['>', '>='] = '>') ndarray | float [source]
Find optimal per-class thresholds for multiclass classification using One-vs-Rest.
- Parameters:
true_labs – Array of true class labels (0, 1, 2, …, n_classes-1).
pred_prob – Array of predicted probabilities with shape (n_samples, n_classes).
metric – Name of a metric registered in
METRIC_REGISTRY
.method –
Strategy used for optimization: -
"auto"
: Automatically selects best method (default) -"sort_scan"
: O(n log n) algorithm for piecewise metrics withvectorized implementation
"smart_brute"
: Evaluates all unique probabilities"minimize"
: Usesscipy.optimize.minimize_scalar
"gradient"
: Simple gradient ascent"coord_ascent"
: Coordinate ascent for coupled multiclass optimization (single-label consistent)
average – Averaging strategy that affects optimization: - “macro”/”none”: Optimize each class independently (default behavior) - “micro”: Optimize to maximize micro-averaged metric across all classes - “weighted”: Optimize each class independently, same as macro
sample_weight – Optional array of sample weights for handling imbalanced datasets.
vectorized – If True, use vectorized implementation for better performance when possible.
comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).
- Returns:
For “macro”/”weighted”/”none”: Array of optimal thresholds, one per class. For “micro” with single threshold strategy: Single optimal threshold.
- Return type:
np.ndarray | float
Threshold Optimizer Class
High-level wrapper for threshold optimization.
- class optimal_cutoffs.wrapper.ThresholdOptimizer(objective: str = 'accuracy', verbose: bool = False, method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'auto', comparison: Literal['>', '>='] = '>')[source]
Optimizer for classification thresholds supporting both binary and multiclass.
The class wraps threshold optimization functions and exposes a scikit-learn style
fit
/predict
API. For multiclass, uses One-vs-Rest strategy.- __init__(objective: str = 'accuracy', verbose: bool = False, method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'auto', comparison: Literal['>', '>='] = '>') None [source]
Create a new optimizer.
- Parameters:
objective – Metric to optimize, e.g.
"accuracy"
,"f1"
,"precision"
,"recall"
.verbose – If
True
, print progress during threshold search.method –
Optimization method: -
"auto"
: Automatically selects best method (default) -"sort_scan"
: O(n log n) algorithm for piecewise metrics withvectorized implementation
"smart_brute"
: Evaluates all unique probabilities"minimize"
: Usesscipy.optimize.minimize_scalar
"gradient"
: Simple gradient ascent"coord_ascent"
: Coordinate ascent for coupled multiclass optimization (single-label consistent)
comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).
- fit(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], sample_weight: ndarray | list[float] | list[int] | None = None) Self [source]
Estimate the optimal threshold(s) from labeled data.
- Parameters:
true_labs – Array of true labels. For binary: (0, 1). For multiclass: (0, 1, 2, …, n_classes-1).
pred_prob – Predicted probabilities from a classifier. For binary: 1D array (n_samples,). For multiclass: 2D array (n_samples, n_classes).
sample_weight – Optional array of sample weights for handling imbalanced datasets.
- Returns:
Fitted instance with
threshold_
attribute set.- Return type:
Self
- predict(pred_prob: ndarray | list[float] | list[int]) ndarray [source]
Convert probabilities to class predictions using the learned threshold(s).
- Parameters:
pred_prob – Array of predicted probabilities to be thresholded.
- Returns:
For binary: Boolean array of predicted class labels. For multiclass: Integer array of predicted class labels.
- Return type:
np.ndarray
Metrics
Metric registry, confusion matrix utilities, and built-in metrics.
- optimal_cutoffs.metrics.register_metric(name: str | None = None, func: Callable[[int | float, int | float, int | float, int | float], float] | None = None, vectorized_func: Callable | None = None, is_piecewise: bool = True, maximize: bool = True, needs_proba: bool = False) Callable[[int | float, int | float, int | float, int | float], float] | Callable[[Callable[[int | float, int | float, int | float, int | float], float]], Callable[[int | float, int | float, int | float, int | float], float]] [source]
Register a metric function with optional vectorized version.
- Parameters:
name – Optional key under which to store the metric. If not provided the function’s
__name__
is used.func – Metric callable accepting
tp, tn, fp, fn
scalars and returning a float. When supplied the function is registered immediately. If omitted, the returned decorator can be used to annotate a metric function.vectorized_func – Optional vectorized version of the metric that accepts
tp, tn, fp, fn
as arrays and returns an array of scores. Used for O(n log n) optimization.is_piecewise – Whether the metric is piecewise-constant with respect to threshold changes. Piecewise metrics can be optimized using O(n log n) algorithms.
maximize – Whether the metric should be maximized (True) or minimized (False).
needs_proba – Whether the metric requires probability scores rather than just thresholds. Used for metrics like log-loss or Brier score.
- Returns:
The registered function or decorator.
- Return type:
MetricFunc | Callable[[MetricFunc], MetricFunc]
- optimal_cutoffs.metrics.register_metrics(metrics: dict[str, Callable[[int | float, int | float, int | float, int | float], float]], is_piecewise: bool = True, maximize: bool = True, needs_proba: bool = False) None [source]
Register multiple metric functions.
- Parameters:
metrics – Mapping of metric names to callables that accept
tp, tn, fp, fn
.is_piecewise – Whether the metrics are piecewise-constant with respect to threshold changes.
maximize – Whether the metrics should be maximized (True) or minimized (False).
needs_proba – Whether the metrics require probability scores rather than just thresholds.
- Returns:
This function mutates the global
METRIC_REGISTRY
in-place.- Return type:
None
- optimal_cutoffs.metrics.is_piecewise_metric(metric_name: str) bool [source]
Check if a metric is piecewise-constant.
- Parameters:
metric_name – Name of the metric to check.
- Returns:
True if the metric is piecewise-constant, False otherwise. Defaults to True for unknown metrics.
- Return type:
- optimal_cutoffs.metrics.should_maximize_metric(metric_name: str) bool [source]
Check if a metric should be maximized.
- Parameters:
metric_name – Name of the metric to check.
- Returns:
True if the metric should be maximized, False if minimized. Defaults to True for unknown metrics.
- Return type:
- optimal_cutoffs.metrics.needs_probability_scores(metric_name: str) bool [source]
Check if a metric needs probability scores rather than just thresholds.
- Parameters:
metric_name – Name of the metric to check.
- Returns:
True if the metric needs probability scores, False otherwise. Defaults to False for unknown metrics.
- Return type:
- optimal_cutoffs.metrics.has_vectorized_implementation(metric_name: str) bool [source]
Check if a metric has a vectorized implementation available.
- Parameters:
metric_name – Name of the metric to check.
- Returns:
True if the metric has a vectorized implementation, False otherwise.
- Return type:
- optimal_cutoffs.metrics.get_vectorized_metric(metric_name: str) Callable [source]
Get vectorized version of a metric function.
- Parameters:
metric_name – Name of the metric.
- Returns:
Vectorized metric function that accepts arrays.
- Return type:
Callable
- Raises:
ValueError – If metric is not available in vectorized form.
- optimal_cutoffs.metrics.f1_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) float [source]
Compute the F1 score.
- Parameters:
tp – Elements of the confusion matrix.
tn – Elements of the confusion matrix.
fp – Elements of the confusion matrix.
fn – Elements of the confusion matrix.
- Returns:
The harmonic mean of precision and recall.
- Return type:
- optimal_cutoffs.metrics.accuracy_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) float [source]
Compute classification accuracy.
- Parameters:
tp – Elements of the confusion matrix.
tn – Elements of the confusion matrix.
fp – Elements of the confusion matrix.
fn – Elements of the confusion matrix.
- Returns:
Ratio of correct predictions to total samples.
- Return type:
- optimal_cutoffs.metrics.precision_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) float [source]
Compute precision (positive predictive value).
- Parameters:
tp – Elements of the confusion matrix.
tn – Elements of the confusion matrix.
fp – Elements of the confusion matrix.
fn – Elements of the confusion matrix.
- Returns:
Ratio of true positives to predicted positives.
- Return type:
- optimal_cutoffs.metrics.recall_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) float [source]
Compute recall (sensitivity, true positive rate).
- Parameters:
tp – Elements of the confusion matrix.
tn – Elements of the confusion matrix.
fp – Elements of the confusion matrix.
fn – Elements of the confusion matrix.
- Returns:
Ratio of true positives to actual positives.
- Return type:
- optimal_cutoffs.metrics.multiclass_metric_exclusive(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], thresholds: ndarray | list[float] | list[int], metric_name: str, comparison: str = '>', sample_weight: ndarray | list[float] | list[int] | None = None) float [source]
Compute exclusive single-label multiclass metrics.
Uses margin-based decision rule: predict class with highest margin (p_j - tau_j). Computes sample-level accuracy or macro-averaged precision/recall/F1.
- Parameters:
true_labs (ArrayLike) – True class labels (n_samples,)
pred_prob (ArrayLike) – Predicted probabilities (n_samples, n_classes)
thresholds (ArrayLike) – Per-class thresholds (n_classes,)
metric_name (str) – Metric to compute (“accuracy”, “f1”, “precision”, “recall”)
comparison (str) – Comparison operator (“>” or “>=”)
sample_weight (ArrayLike | None) – Optional sample weights
- Returns:
Computed metric value
- Return type:
- optimal_cutoffs.metrics.multiclass_metric(confusion_matrices: list[tuple[int | float, int | float, int | float, int | float]], metric_name: str, average: str = 'macro') float | ndarray [source]
Compute multiclass metrics from per-class confusion matrices.
- Parameters:
confusion_matrices – List of per-class confusion matrix tuples
(tp, tn, fp, fn)
.metric_name – Name of the metric to compute (must be in METRIC_REGISTRY).
average –
Averaging strategy: “macro”, “micro”, “weighted”, or “none”. - “macro”: Unweighted mean of per-class metrics (treats all classes equally) - “micro”: Global metric computed on pooled confusion matrix
(treats all samples equally, OvR multilabel)
”weighted”: Weighted mean by support (number of true instances per class)
”none”: No averaging, returns array of per-class metrics
Note: For exclusive single-label accuracy, use multiclass_metric_exclusive().
- Returns:
Aggregated metric score (float) or per-class scores (array) if average=”none”.
- Return type:
float | np.ndarray
- optimal_cutoffs.metrics.get_confusion_matrix(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], prob: float, sample_weight: ndarray | list[float] | list[int] | None = None, comparison: Literal['>', '>='] = '>') tuple[int | float, int | float, int | float, int | float] [source]
Compute confusion-matrix counts for a given threshold.
- Parameters:
true_labs – Array of true binary labels in {0, 1}.
pred_prob – Array of predicted probabilities in [0, 1].
prob – Decision threshold applied to
pred_prob
.sample_weight – Optional array of sample weights. If None, all samples have equal weight.
comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive). - “>”: pred_prob > threshold (default, excludes ties) - “>=”: pred_prob >= threshold (includes ties)
- Returns:
Counts
(tp, tn, fp, fn)
. Returns int when sample_weight is None, float when sample_weight is provided to preserve fractional weighted counts.- Return type:
- optimal_cutoffs.metrics.get_multiclass_confusion_matrix(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], thresholds: ndarray | list[float] | list[int], sample_weight: ndarray | list[float] | list[int] | None = None, comparison: Literal['>', '>='] = '>') list[tuple[int | float, int | float, int | float, int | float]] [source]
Compute per-class confusion-matrix counts for multiclass classification using One-vs-Rest.
- Parameters:
true_labs – Array of true class labels (0, 1, 2, …, n_classes-1).
pred_prob – Array of predicted probabilities with shape (n_samples, n_classes).
thresholds – Array of decision thresholds, one per class.
sample_weight – Optional array of sample weights. If None, all samples have equal weight.
comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).
- Returns:
List of per-class counts
(tp, tn, fp, fn)
for each class. Returns int when sample_weight is None, float when sample_weight is provided.- Return type:
list[tuple[int | float, int | float, int | float, int | float]]
Cross-Validation
Cross-validation helpers for threshold optimization.
- optimal_cutoffs.cv.cv_threshold_optimization(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'smart_brute', cv: int = 5, random_state: int | None = None, sample_weight: ndarray | list[float] | list[int] | None = None) tuple[ndarray, ndarray] [source]
Estimate an optimal threshold using cross-validation.
- Parameters:
true_labs – Array of true binary labels.
pred_prob – Predicted probabilities from a classifier.
metric – Metric name to optimize; must exist in the metric registry.
method – Optimization strategy passed to
get_optimal_threshold()
.cv – Number of folds for
KFold
cross-validation.random_state – Seed for the cross-validator shuffling.
sample_weight – Optional array of sample weights for handling imbalanced datasets.
- Returns:
Arrays of per-fold thresholds and scores.
- Return type:
tuple[np.ndarray, np.ndarray]
- optimal_cutoffs.cv.nested_cv_threshold_optimization(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'smart_brute', inner_cv: int = 5, outer_cv: int = 5, random_state: int | None = None, sample_weight: ndarray | list[float] | list[int] | None = None) tuple[ndarray, ndarray] [source]
Nested cross-validation for threshold optimization.
- Parameters:
true_labs – Array of true binary labels.
pred_prob – Predicted probabilities from a classifier.
metric – Metric name to optimize.
method – Optimization strategy passed to
get_optimal_threshold()
.inner_cv – Number of folds in the inner loop used to estimate thresholds.
outer_cv – Number of outer folds for unbiased performance assessment.
random_state – Seed for the cross-validators.
sample_weight – Optional array of sample weights for handling imbalanced datasets.
- Returns:
Arrays of outer-fold thresholds and scores.
- Return type:
tuple[np.ndarray, np.ndarray]