Optimal Classification Cutoffs

A Python library for computing optimal classification thresholds for binary and multiclass classification problems.

Features

Automatic detection of binary vs multiclass problems
Multiple optimization methods (brute force, scipy minimize, gradient ascent)
Support for custom metrics
Cross-validation utilities
Scikit-learn compatible API
One-vs-Rest strategy for multiclass problems

Installation

pip install optimal-classification-cutoffs

Quick Start

Binary Classification

from optimal_cutoffs import get_optimal_threshold
import numpy as np

# Binary classification example
y_true = np.array([0, 0, 1, 1])
y_prob = np.array([0.1, 0.4, 0.35, 0.8])

threshold = get_optimal_threshold(y_true, y_prob, metric='f1')
print(f"Optimal threshold: {threshold}")

Multiclass Classification

from optimal_cutoffs import get_optimal_threshold
import numpy as np

# Multiclass classification example
y_true = np.array([0, 1, 2, 0, 1, 2])
y_prob = np.array([
    [0.7, 0.2, 0.1],
    [0.1, 0.8, 0.1],
    [0.1, 0.1, 0.8],
    [0.6, 0.3, 0.1],
    [0.2, 0.7, 0.1],
    [0.1, 0.2, 0.7]
])

thresholds = get_optimal_threshold(y_true, y_prob, metric='f1')
print(f"Optimal thresholds per class: {thresholds}")

Using the Scikit-learn Interface

from optimal_cutoffs import ThresholdOptimizer
from sklearn.model_selection import train_test_split

# Initialize optimizer
optimizer = ThresholdOptimizer(metric='f1', method='smart_brute')

# Fit on training data
optimizer.fit(y_train, y_prob_train)

# Predict on test data
y_pred = optimizer.predict(y_prob_test)

Theory and Background

Understanding why standard optimization methods can fail for classification metrics:

Theory and Background

API Reference

Core Functions

Threshold search strategies for optimizing classification metrics.

optimal_cutoffs.optimizers.get_probability(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], objective: Literal['accuracy', 'f1'] = 'accuracy', verbose: bool = False) → float[source]

Brute-force search for a simple metric’s best threshold.

Deprecated since version 1.0.0: get_probability() is deprecated and will be removed in a future version. Use get_optimal_threshold() instead, which provides a unified API for both binary and multiclass classification with more optimization methods and additional features like sample weights.

Parameters:

true_labs – Array of true binary labels.
pred_prob – Predicted probabilities from a classifier.
objective – Metric to optimize. Supported values are "accuracy" and "f1".
verbose – If True, print intermediate metric values during the search.

Returns:

Threshold that maximizes the specified metric.

Return type:

float

optimal_cutoffs.optimizers.get_optimal_threshold(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'auto', sample_weight: ndarray | list[float] | list[int] | None = None, comparison: Literal['>', '>='] = '>') → float | ndarray[source]

Find the threshold that optimizes a metric.

Parameters:

true_labs – Array of true binary labels or multiclass labels (0, 1, 2, …, n_classes-1).
pred_prob – Predicted probabilities from a classifier. For binary: 1D array (n_samples,). For multiclass: 2D array (n_samples, n_classes).
metric – Name of a metric registered in METRIC_REGISTRY.
method –
Strategy used for optimization: - "auto": Automatically selects best method (default) - "sort_scan": O(n log n) algorithm for piecewise metrics with

vectorized implementation
- "smart_brute": Evaluates all unique probabilities
- "minimize": Uses scipy.optimize.minimize_scalar
- "gradient": Simple gradient ascent
- "dinkelbach": Exact expected F-beta optimization (F1 only)
sample_weight – Optional array of sample weights for handling imbalanced datasets.
comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).

Returns:

For binary: The threshold that maximizes the chosen metric. For multiclass: Array of per-class thresholds.

Return type:

float | np.ndarray

optimal_cutoffs.optimizers.get_optimal_multiclass_thresholds(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'auto', average: Literal['macro', 'micro', 'weighted', 'none'] = 'macro', sample_weight: ndarray | list[float] | list[int] | None = None, vectorized: bool = False, comparison: Literal['>', '>='] = '>') → ndarray | float[source]

Find optimal per-class thresholds for multiclass classification using One-vs-Rest.

Parameters:

true_labs – Array of true class labels (0, 1, 2, …, n_classes-1).
pred_prob – Array of predicted probabilities with shape (n_samples, n_classes).
metric – Name of a metric registered in METRIC_REGISTRY.
method –
Strategy used for optimization: - "auto": Automatically selects best method (default) - "sort_scan": O(n log n) algorithm for piecewise metrics with

vectorized implementation
- "smart_brute": Evaluates all unique probabilities
- "minimize": Uses scipy.optimize.minimize_scalar
- "gradient": Simple gradient ascent
- "coord_ascent": Coordinate ascent for coupled multiclass optimization (single-label consistent)
average – Averaging strategy that affects optimization: - “macro”/”none”: Optimize each class independently (default behavior) - “micro”: Optimize to maximize micro-averaged metric across all classes - “weighted”: Optimize each class independently, same as macro
sample_weight – Optional array of sample weights for handling imbalanced datasets.
vectorized – If True, use vectorized implementation for better performance when possible.
comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).

Returns:

For “macro”/”weighted”/”none”: Array of optimal thresholds, one per class. For “micro” with single threshold strategy: Single optimal threshold.

Return type:

np.ndarray | float

Threshold Optimizer Class

High-level wrapper for threshold optimization.

class optimal_cutoffs.wrapper.ThresholdOptimizer(objective: str = 'accuracy', verbose: bool = False, method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'auto', comparison: Literal['>', '>='] = '>')[source]

Optimizer for classification thresholds supporting both binary and multiclass.

The class wraps threshold optimization functions and exposes a scikit-learn style fit/predict API. For multiclass, uses One-vs-Rest strategy.

__init__(objective: str = 'accuracy', verbose: bool = False, method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'auto', comparison: Literal['>', '>='] = '>') → None[source]

Create a new optimizer.

Parameters:

objective – Metric to optimize, e.g. "accuracy", "f1", "precision", "recall".
verbose – If True, print progress during threshold search.
method –
Optimization method: - "auto": Automatically selects best method (default) - "sort_scan": O(n log n) algorithm for piecewise metrics with

vectorized implementation
- "smart_brute": Evaluates all unique probabilities
- "minimize": Uses scipy.optimize.minimize_scalar
- "gradient": Simple gradient ascent
- "coord_ascent": Coordinate ascent for coupled multiclass optimization (single-label consistent)
comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).

Estimate the optimal threshold(s) from labeled data.

Parameters:

true_labs – Array of true labels. For binary: (0, 1). For multiclass: (0, 1, 2, …, n_classes-1).
pred_prob – Predicted probabilities from a classifier. For binary: 1D array (n_samples,). For multiclass: 2D array (n_samples, n_classes).
sample_weight – Optional array of sample weights for handling imbalanced datasets.

Returns:

Fitted instance with threshold_ attribute set.

Return type:

Self

predict(pred_prob: ndarray | list[float] | list[int]) → ndarray[source]

Convert probabilities to class predictions using the learned threshold(s).

Parameters:: pred_prob – Array of predicted probabilities to be thresholded.
Returns:: For binary: Boolean array of predicted class labels. For multiclass: Integer array of predicted class labels.
Return type:: np.ndarray

Metrics

Metric registry, confusion matrix utilities, and built-in metrics.

Register a metric function with optional vectorized version.

Parameters:

name – Optional key under which to store the metric. If not provided the function’s __name__ is used.
func – Metric callable accepting tp, tn, fp, fn scalars and returning a float. When supplied the function is registered immediately. If omitted, the returned decorator can be used to annotate a metric function.
vectorized_func – Optional vectorized version of the metric that accepts tp, tn, fp, fn as arrays and returns an array of scores. Used for O(n log n) optimization.
is_piecewise – Whether the metric is piecewise-constant with respect to threshold changes. Piecewise metrics can be optimized using O(n log n) algorithms.
maximize – Whether the metric should be maximized (True) or minimized (False).
needs_proba – Whether the metric requires probability scores rather than just thresholds. Used for metrics like log-loss or Brier score.

Returns:

The registered function or decorator.

Return type:

MetricFunc | Callable[[MetricFunc], MetricFunc]

optimal_cutoffs.metrics.register_metrics(metrics: dict[str, Callable[[int | float, int | float, int | float, int | float], float]], is_piecewise: bool = True, maximize: bool = True, needs_proba: bool = False) → None[source]

Register multiple metric functions.

Parameters:

metrics – Mapping of metric names to callables that accept tp, tn, fp, fn.
is_piecewise – Whether the metrics are piecewise-constant with respect to threshold changes.
maximize – Whether the metrics should be maximized (True) or minimized (False).
needs_proba – Whether the metrics require probability scores rather than just thresholds.

Returns:

This function mutates the global METRIC_REGISTRY in-place.

Return type:

None

optimal_cutoffs.metrics.is_piecewise_metric(metric_name: str) → bool[source]

Check if a metric is piecewise-constant.

Parameters:: metric_name – Name of the metric to check.
Returns:: True if the metric is piecewise-constant, False otherwise. Defaults to True for unknown metrics.
Return type:: bool

optimal_cutoffs.metrics.should_maximize_metric(metric_name: str) → bool[source]

Check if a metric should be maximized.

Parameters:: metric_name – Name of the metric to check.
Returns:: True if the metric should be maximized, False if minimized. Defaults to True for unknown metrics.
Return type:: bool

optimal_cutoffs.metrics.needs_probability_scores(metric_name: str) → bool[source]

Check if a metric needs probability scores rather than just thresholds.

Parameters:: metric_name – Name of the metric to check.
Returns:: True if the metric needs probability scores, False otherwise. Defaults to False for unknown metrics.
Return type:: bool

optimal_cutoffs.metrics.has_vectorized_implementation(metric_name: str) → bool[source]

Check if a metric has a vectorized implementation available.

Parameters:: metric_name – Name of the metric to check.
Returns:: True if the metric has a vectorized implementation, False otherwise.
Return type:: bool

optimal_cutoffs.metrics.get_vectorized_metric(metric_name: str) → Callable[source]

Get vectorized version of a metric function.

Parameters:: metric_name – Name of the metric.
Returns:: Vectorized metric function that accepts arrays.
Return type:: Callable
Raises:: ValueError – If metric is not available in vectorized form.

optimal_cutoffs.metrics.f1_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) → float[source]

Compute the F₁ score.

Parameters:

tp – Elements of the confusion matrix.
tn – Elements of the confusion matrix.
fp – Elements of the confusion matrix.
fn – Elements of the confusion matrix.

Returns:

The harmonic mean of precision and recall.

Return type:

float

optimal_cutoffs.metrics.accuracy_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) → float[source]

Compute classification accuracy.

Parameters:

tp – Elements of the confusion matrix.
tn – Elements of the confusion matrix.
fp – Elements of the confusion matrix.
fn – Elements of the confusion matrix.

Returns:

Ratio of correct predictions to total samples.

Return type:

float

optimal_cutoffs.metrics.precision_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) → float[source]

Compute precision (positive predictive value).

Parameters:

tp – Elements of the confusion matrix.
tn – Elements of the confusion matrix.
fp – Elements of the confusion matrix.
fn – Elements of the confusion matrix.

Returns:

Ratio of true positives to predicted positives.

Return type:

float

optimal_cutoffs.metrics.recall_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) → float[source]

Compute recall (sensitivity, true positive rate).

Parameters:

tp – Elements of the confusion matrix.
tn – Elements of the confusion matrix.
fp – Elements of the confusion matrix.
fn – Elements of the confusion matrix.

Returns:

Ratio of true positives to actual positives.

Return type:

float

Compute exclusive single-label multiclass metrics.

Uses margin-based decision rule: predict class with highest margin (p_j - tau_j). Computes sample-level accuracy or macro-averaged precision/recall/F1.

Parameters:

true_labs (ArrayLike) – True class labels (n_samples,)
pred_prob (ArrayLike) – Predicted probabilities (n_samples, n_classes)
thresholds (ArrayLike) – Per-class thresholds (n_classes,)
metric_name (str) – Metric to compute (“accuracy”, “f1”, “precision”, “recall”)
comparison (str) – Comparison operator (“>” or “>=”)
sample_weight (ArrayLike | None) – Optional sample weights

Returns:

Computed metric value

Return type:

float

Compute multiclass metrics from per-class confusion matrices.

Parameters:

confusion_matrices – List of per-class confusion matrix tuples (tp, tn, fp, fn).
metric_name – Name of the metric to compute (must be in METRIC_REGISTRY).
average –
Averaging strategy: “macro”, “micro”, “weighted”, or “none”. - “macro”: Unweighted mean of per-class metrics (treats all classes equally) - “micro”: Global metric computed on pooled confusion matrix

(treats all samples equally, OvR multilabel)
- ”weighted”: Weighted mean by support (number of true instances per class)
- ”none”: No averaging, returns array of per-class metrics
Note: For exclusive single-label accuracy, use multiclass_metric_exclusive().

Returns:

Aggregated metric score (float) or per-class scores (array) if average=”none”.

Return type:

float | np.ndarray

Compute confusion-matrix counts for a given threshold.

Parameters:

true_labs – Array of true binary labels in {0, 1}.
pred_prob – Array of predicted probabilities in [0, 1].
prob – Decision threshold applied to pred_prob.
sample_weight – Optional array of sample weights. If None, all samples have equal weight.
comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive). - “>”: pred_prob > threshold (default, excludes ties) - “>=”: pred_prob >= threshold (includes ties)

Returns:

Counts (tp, tn, fp, fn). Returns int when sample_weight is None, float when sample_weight is provided to preserve fractional weighted counts.

Return type:

tuple[int | float, int | float, int | float, int | float]

Compute per-class confusion-matrix counts for multiclass classification using One-vs-Rest.

Parameters:

true_labs – Array of true class labels (0, 1, 2, …, n_classes-1).
pred_prob – Array of predicted probabilities with shape (n_samples, n_classes).
thresholds – Array of decision thresholds, one per class.
sample_weight – Optional array of sample weights. If None, all samples have equal weight.
comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).

Returns:

List of per-class counts (tp, tn, fp, fn) for each class. Returns int when sample_weight is None, float when sample_weight is provided.

Return type:

list[tuple[int | float, int | float, int | float, int | float]]

Cross-Validation

Cross-validation helpers for threshold optimization.

optimal_cutoffs.cv.cv_threshold_optimization(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'smart_brute', cv: int = 5, random_state: int | None = None, sample_weight: ndarray | list[float] | list[int] | None = None) → tuple[ndarray, ndarray][source]

Estimate an optimal threshold using cross-validation.

Parameters:

true_labs – Array of true binary labels.
pred_prob – Predicted probabilities from a classifier.
metric – Metric name to optimize; must exist in the metric registry.
method – Optimization strategy passed to get_optimal_threshold().
cv – Number of folds for KFold cross-validation.
random_state – Seed for the cross-validator shuffling.
sample_weight – Optional array of sample weights for handling imbalanced datasets.

Returns:

Arrays of per-fold thresholds and scores.

Return type:

tuple[np.ndarray, np.ndarray]

optimal_cutoffs.cv.nested_cv_threshold_optimization(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent', 'dinkelbach'] = 'smart_brute', inner_cv: int = 5, outer_cv: int = 5, random_state: int | None = None, sample_weight: ndarray | list[float] | list[int] | None = None) → tuple[ndarray, ndarray][source]

Nested cross-validation for threshold optimization.

Parameters:

true_labs – Array of true binary labels.
pred_prob – Predicted probabilities from a classifier.
metric – Metric name to optimize.
method – Optimization strategy passed to get_optimal_threshold().
inner_cv – Number of folds in the inner loop used to estimate thresholds.
outer_cv – Number of outer folds for unbiased performance assessment.
random_state – Seed for the cross-validators.
sample_weight – Optional array of sample weights for handling imbalanced datasets.

Returns:

Arrays of outer-fold thresholds and scores.

Return type:

tuple[np.ndarray, np.ndarray]

Optimal Classification Cutoffs

Features

Installation

Quick Start

Binary Classification

Multiclass Classification

Using the Scikit-learn Interface

Theory and Background

API Reference

Core Functions

Threshold Optimizer Class

Metrics

Cross-Validation

Indices and tables