Optimal Classification Cutoffs

A Python library for computing optimal classification thresholds for binary and multiclass classification problems.

Features

  • Automatic detection of binary vs multiclass problems

  • Multiple optimization methods (brute force, scipy minimize, gradient ascent)

  • Support for custom metrics

  • Cross-validation utilities

  • Scikit-learn compatible API

  • One-vs-Rest strategy for multiclass problems

Installation

pip install optimal-classification-cutoffs

Quick Start

Binary Classification

from optimal_cutoffs import get_optimal_threshold
import numpy as np

# Binary classification example
y_true = np.array([0, 0, 1, 1])
y_prob = np.array([0.1, 0.4, 0.35, 0.8])

threshold = get_optimal_threshold(y_true, y_prob, metric='f1')
print(f"Optimal threshold: {threshold}")

Multiclass Classification

from optimal_cutoffs import get_optimal_threshold
import numpy as np

# Multiclass classification example
y_true = np.array([0, 1, 2, 0, 1, 2])
y_prob = np.array([
    [0.7, 0.2, 0.1],
    [0.1, 0.8, 0.1],
    [0.1, 0.1, 0.8],
    [0.6, 0.3, 0.1],
    [0.2, 0.7, 0.1],
    [0.1, 0.2, 0.7]
])

thresholds = get_optimal_threshold(y_true, y_prob, metric='f1')
print(f"Optimal thresholds per class: {thresholds}")

Using the Scikit-learn Interface

from optimal_cutoffs import ThresholdOptimizer
from sklearn.model_selection import train_test_split

# Initialize optimizer
optimizer = ThresholdOptimizer(metric='f1', method='smart_brute')

# Fit on training data
optimizer.fit(y_train, y_prob_train)

# Predict on test data
y_pred = optimizer.predict(y_prob_test)

Theory and Background

Understanding why standard optimization methods can fail for classification metrics:

API Reference

Core Functions

Threshold search strategies for optimizing classification metrics.

optimal_cutoffs.optimizers.get_probability(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], objective: Literal['accuracy', 'f1'] = 'accuracy', verbose: bool = False) float[source]

Brute-force search for a simple metric’s best threshold.

Deprecated since version 1.0.0: get_probability() is deprecated and will be removed in a future version. Use get_optimal_threshold() instead, which provides a unified API for both binary and multiclass classification with more optimization methods and additional features like sample weights.

Parameters:
  • true_labs – Array of true binary labels.

  • pred_prob – Predicted probabilities from a classifier.

  • objective – Metric to optimize. Supported values are "accuracy" and "f1".

  • verbose – If True, print intermediate metric values during the search.

Returns:

Threshold that maximizes the specified metric.

Return type:

float

optimal_cutoffs.optimizers.get_optimal_threshold(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent'] = 'auto', sample_weight: ndarray | list[float] | list[int] | None = None, comparison: Literal['>', '>='] = '>') float | ndarray[source]

Find the threshold that optimizes a metric.

Parameters:
  • true_labs – Array of true binary labels or multiclass labels (0, 1, 2, …, n_classes-1).

  • pred_prob – Predicted probabilities from a classifier. For binary: 1D array (n_samples,). For multiclass: 2D array (n_samples, n_classes).

  • metric – Name of a metric registered in METRIC_REGISTRY.

  • method

    Strategy used for optimization: - "auto": Automatically selects best method (default) - "sort_scan": O(n log n) algorithm for piecewise metrics with

    vectorized implementation

    • "smart_brute": Evaluates all unique probabilities

    • "minimize": Uses scipy.optimize.minimize_scalar

    • "gradient": Simple gradient ascent

  • sample_weight – Optional array of sample weights for handling imbalanced datasets.

  • comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).

Returns:

For binary: The threshold that maximizes the chosen metric. For multiclass: Array of per-class thresholds.

Return type:

float | np.ndarray

optimal_cutoffs.optimizers.get_optimal_multiclass_thresholds(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent'] = 'auto', average: Literal['macro', 'micro', 'weighted', 'none'] = 'macro', sample_weight: ndarray | list[float] | list[int] | None = None, vectorized: bool = False, comparison: Literal['>', '>='] = '>') ndarray | float[source]

Find optimal per-class thresholds for multiclass classification using One-vs-Rest.

Parameters:
  • true_labs – Array of true class labels (0, 1, 2, …, n_classes-1).

  • pred_prob – Array of predicted probabilities with shape (n_samples, n_classes).

  • metric – Name of a metric registered in METRIC_REGISTRY.

  • method

    Strategy used for optimization: - "auto": Automatically selects best method (default) - "sort_scan": O(n log n) algorithm for piecewise metrics with

    vectorized implementation

    • "smart_brute": Evaluates all unique probabilities

    • "minimize": Uses scipy.optimize.minimize_scalar

    • "gradient": Simple gradient ascent

    • "coord_ascent": Coordinate ascent for coupled multiclass optimization (single-label consistent)

  • average – Averaging strategy that affects optimization: - “macro”/”none”: Optimize each class independently (default behavior) - “micro”: Optimize to maximize micro-averaged metric across all classes - “weighted”: Optimize each class independently, same as macro

  • sample_weight – Optional array of sample weights for handling imbalanced datasets.

  • vectorized – If True, use vectorized implementation for better performance when possible.

  • comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).

Returns:

For “macro”/”weighted”/”none”: Array of optimal thresholds, one per class. For “micro” with single threshold strategy: Single optimal threshold.

Return type:

np.ndarray | float

Threshold Optimizer Class

High-level wrapper for threshold optimization.

class optimal_cutoffs.wrapper.ThresholdOptimizer(objective: str = 'accuracy', verbose: bool = False, method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent'] = 'auto', comparison: Literal['>', '>='] = '>')[source]

Optimizer for classification thresholds supporting both binary and multiclass.

The class wraps threshold optimization functions and exposes a scikit-learn style fit/predict API. For multiclass, uses One-vs-Rest strategy.

__init__(objective: str = 'accuracy', verbose: bool = False, method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent'] = 'auto', comparison: Literal['>', '>='] = '>') None[source]

Create a new optimizer.

Parameters:
  • objective – Metric to optimize, e.g. "accuracy", "f1", "precision", "recall".

  • verbose – If True, print progress during threshold search.

  • method

    Optimization method: - "auto": Automatically selects best method (default) - "sort_scan": O(n log n) algorithm for piecewise metrics with

    vectorized implementation

    • "smart_brute": Evaluates all unique probabilities

    • "minimize": Uses scipy.optimize.minimize_scalar

    • "gradient": Simple gradient ascent

    • "coord_ascent": Coordinate ascent for coupled multiclass optimization (single-label consistent)

  • comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).

fit(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], sample_weight: ndarray | list[float] | list[int] | None = None) Self[source]

Estimate the optimal threshold(s) from labeled data.

Parameters:
  • true_labs – Array of true labels. For binary: (0, 1). For multiclass: (0, 1, 2, …, n_classes-1).

  • pred_prob – Predicted probabilities from a classifier. For binary: 1D array (n_samples,). For multiclass: 2D array (n_samples, n_classes).

  • sample_weight – Optional array of sample weights for handling imbalanced datasets.

Returns:

Fitted instance with threshold_ attribute set.

Return type:

Self

predict(pred_prob: ndarray | list[float] | list[int]) ndarray[source]

Convert probabilities to class predictions using the learned threshold(s).

Parameters:

pred_prob – Array of predicted probabilities to be thresholded.

Returns:

For binary: Boolean array of predicted class labels. For multiclass: Integer array of predicted class labels.

Return type:

np.ndarray

Metrics

Metric registry, confusion matrix utilities, and built-in metrics.

optimal_cutoffs.metrics.register_metric(name: str | None = None, func: Callable[[int | float, int | float, int | float, int | float], float] | None = None, vectorized_func: Callable | None = None, is_piecewise: bool = True, maximize: bool = True, needs_proba: bool = False) Callable[[int | float, int | float, int | float, int | float], float] | Callable[[Callable[[int | float, int | float, int | float, int | float], float]], Callable[[int | float, int | float, int | float, int | float], float]][source]

Register a metric function with optional vectorized version.

Parameters:
  • name – Optional key under which to store the metric. If not provided the function’s __name__ is used.

  • func – Metric callable accepting tp, tn, fp, fn scalars and returning a float. When supplied the function is registered immediately. If omitted, the returned decorator can be used to annotate a metric function.

  • vectorized_func – Optional vectorized version of the metric that accepts tp, tn, fp, fn as arrays and returns an array of scores. Used for O(n log n) optimization.

  • is_piecewise – Whether the metric is piecewise-constant with respect to threshold changes. Piecewise metrics can be optimized using O(n log n) algorithms.

  • maximize – Whether the metric should be maximized (True) or minimized (False).

  • needs_proba – Whether the metric requires probability scores rather than just thresholds. Used for metrics like log-loss or Brier score.

Returns:

The registered function or decorator.

Return type:

MetricFunc | Callable[[MetricFunc], MetricFunc]

optimal_cutoffs.metrics.register_metrics(metrics: dict[str, Callable[[int | float, int | float, int | float, int | float], float]], is_piecewise: bool = True, maximize: bool = True, needs_proba: bool = False) None[source]

Register multiple metric functions.

Parameters:
  • metrics – Mapping of metric names to callables that accept tp, tn, fp, fn.

  • is_piecewise – Whether the metrics are piecewise-constant with respect to threshold changes.

  • maximize – Whether the metrics should be maximized (True) or minimized (False).

  • needs_proba – Whether the metrics require probability scores rather than just thresholds.

Returns:

This function mutates the global METRIC_REGISTRY in-place.

Return type:

None

optimal_cutoffs.metrics.is_piecewise_metric(metric_name: str) bool[source]

Check if a metric is piecewise-constant.

Parameters:

metric_name – Name of the metric to check.

Returns:

True if the metric is piecewise-constant, False otherwise. Defaults to True for unknown metrics.

Return type:

bool

optimal_cutoffs.metrics.should_maximize_metric(metric_name: str) bool[source]

Check if a metric should be maximized.

Parameters:

metric_name – Name of the metric to check.

Returns:

True if the metric should be maximized, False if minimized. Defaults to True for unknown metrics.

Return type:

bool

optimal_cutoffs.metrics.needs_probability_scores(metric_name: str) bool[source]

Check if a metric needs probability scores rather than just thresholds.

Parameters:

metric_name – Name of the metric to check.

Returns:

True if the metric needs probability scores, False otherwise. Defaults to False for unknown metrics.

Return type:

bool

optimal_cutoffs.metrics.has_vectorized_implementation(metric_name: str) bool[source]

Check if a metric has a vectorized implementation available.

Parameters:

metric_name – Name of the metric to check.

Returns:

True if the metric has a vectorized implementation, False otherwise.

Return type:

bool

optimal_cutoffs.metrics.get_vectorized_metric(metric_name: str) Callable[source]

Get vectorized version of a metric function.

Parameters:

metric_name – Name of the metric.

Returns:

Vectorized metric function that accepts arrays.

Return type:

Callable

Raises:

ValueError – If metric is not available in vectorized form.

optimal_cutoffs.metrics.f1_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) float[source]

Compute the F1 score.

Parameters:
  • tp – Elements of the confusion matrix.

  • tn – Elements of the confusion matrix.

  • fp – Elements of the confusion matrix.

  • fn – Elements of the confusion matrix.

Returns:

The harmonic mean of precision and recall.

Return type:

float

optimal_cutoffs.metrics.accuracy_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) float[source]

Compute classification accuracy.

Parameters:
  • tp – Elements of the confusion matrix.

  • tn – Elements of the confusion matrix.

  • fp – Elements of the confusion matrix.

  • fn – Elements of the confusion matrix.

Returns:

Ratio of correct predictions to total samples.

Return type:

float

optimal_cutoffs.metrics.precision_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) float[source]

Compute precision (positive predictive value).

Parameters:
  • tp – Elements of the confusion matrix.

  • tn – Elements of the confusion matrix.

  • fp – Elements of the confusion matrix.

  • fn – Elements of the confusion matrix.

Returns:

Ratio of true positives to predicted positives.

Return type:

float

optimal_cutoffs.metrics.recall_score(tp: int | float, tn: int | float, fp: int | float, fn: int | float) float[source]

Compute recall (sensitivity, true positive rate).

Parameters:
  • tp – Elements of the confusion matrix.

  • tn – Elements of the confusion matrix.

  • fp – Elements of the confusion matrix.

  • fn – Elements of the confusion matrix.

Returns:

Ratio of true positives to actual positives.

Return type:

float

optimal_cutoffs.metrics.multiclass_metric(confusion_matrices: list[tuple[int | float, int | float, int | float, int | float]], metric_name: str, average: str = 'macro') float | ndarray[source]

Compute multiclass metrics from per-class confusion matrices.

Parameters:
  • confusion_matrices – List of per-class confusion matrix tuples (tp, tn, fp, fn).

  • metric_name – Name of the metric to compute (must be in METRIC_REGISTRY).

  • average

    Averaging strategy: “macro”, “micro”, “weighted”, or “none”. - “macro”: Unweighted mean of per-class metrics (treats all classes equally) - “micro”: Global metric computed on pooled confusion matrix

    (treats all samples equally)

    • ”weighted”: Weighted mean by support (number of true instances per class)

    • ”none”: No averaging, returns array of per-class metrics

Returns:

Aggregated metric score (float) or per-class scores (array) if average=”none”.

Return type:

float | np.ndarray

optimal_cutoffs.metrics.get_confusion_matrix(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], prob: float, sample_weight: ndarray | list[float] | list[int] | None = None, comparison: Literal['>', '>='] = '>') tuple[int | float, int | float, int | float, int | float][source]

Compute confusion-matrix counts for a given threshold.

Parameters:
  • true_labs – Array of true binary labels in {0, 1}.

  • pred_prob – Array of predicted probabilities in [0, 1].

  • prob – Decision threshold applied to pred_prob.

  • sample_weight – Optional array of sample weights. If None, all samples have equal weight.

  • comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive). - “>”: pred_prob > threshold (default, excludes ties) - “>=”: pred_prob >= threshold (includes ties)

Returns:

Counts (tp, tn, fp, fn). Returns int when sample_weight is None, float when sample_weight is provided to preserve fractional weighted counts.

Return type:

tuple[int | float, int | float, int | float, int | float]

optimal_cutoffs.metrics.get_multiclass_confusion_matrix(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], thresholds: ndarray | list[float] | list[int], sample_weight: ndarray | list[float] | list[int] | None = None, comparison: Literal['>', '>='] = '>') list[tuple[int | float, int | float, int | float, int | float]][source]

Compute per-class confusion-matrix counts for multiclass classification using One-vs-Rest.

Parameters:
  • true_labs – Array of true class labels (0, 1, 2, …, n_classes-1).

  • pred_prob – Array of predicted probabilities with shape (n_samples, n_classes).

  • thresholds – Array of decision thresholds, one per class.

  • sample_weight – Optional array of sample weights. If None, all samples have equal weight.

  • comparison – Comparison operator for thresholding: “>” (exclusive) or “>=” (inclusive).

Returns:

List of per-class counts (tp, tn, fp, fn) for each class. Returns int when sample_weight is None, float when sample_weight is provided.

Return type:

list[tuple[int | float, int | float, int | float, int | float]]

Cross-Validation

Cross-validation helpers for threshold optimization.

optimal_cutoffs.cv.cv_threshold_optimization(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent'] = 'smart_brute', cv: int = 5, random_state: int | None = None, sample_weight: ndarray | list[float] | list[int] | None = None) tuple[ndarray, ndarray][source]

Estimate an optimal threshold using cross-validation.

Parameters:
  • true_labs – Array of true binary labels.

  • pred_prob – Predicted probabilities from a classifier.

  • metric – Metric name to optimize; must exist in the metric registry.

  • method – Optimization strategy passed to get_optimal_threshold().

  • cv – Number of folds for KFold cross-validation.

  • random_state – Seed for the cross-validator shuffling.

  • sample_weight – Optional array of sample weights for handling imbalanced datasets.

Returns:

Arrays of per-fold thresholds and scores.

Return type:

tuple[np.ndarray, np.ndarray]

optimal_cutoffs.cv.nested_cv_threshold_optimization(true_labs: ndarray | list[float] | list[int], pred_prob: ndarray | list[float] | list[int], metric: str = 'f1', method: Literal['auto', 'smart_brute', 'sort_scan', 'minimize', 'gradient', 'coord_ascent'] = 'smart_brute', inner_cv: int = 5, outer_cv: int = 5, random_state: int | None = None, sample_weight: ndarray | list[float] | list[int] | None = None) tuple[ndarray, ndarray][source]

Nested cross-validation for threshold optimization.

Parameters:
  • true_labs – Array of true binary labels.

  • pred_prob – Predicted probabilities from a classifier.

  • metric – Metric name to optimize.

  • method – Optimization strategy passed to get_optimal_threshold().

  • inner_cv – Number of folds in the inner loop used to estimate thresholds.

  • outer_cv – Number of outer folds for unbiased performance assessment.

  • random_state – Seed for the cross-validators.

  • sample_weight – Optional array of sample weights for handling imbalanced datasets.

Returns:

Arrays of outer-fold thresholds and scores.

Return type:

tuple[np.ndarray, np.ndarray]

Indices and tables