Cross-Validation Utilities

The cv module provides functions for robust threshold estimation using cross-validation techniques.

Cross-Validation Functions

optimal_cutoffs.cv.cv_threshold_optimization(true_labs: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], pred_prob: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], metric: str = 'f1', method: str = 'auto', cv: int | Any = 5, random_state: int | None = None, sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, *, comparison: str = '>', average: str = 'macro', **opt_kwargs: Any) tuple[ndarray[Any, Any], ndarray[Any, Any]][source]

Estimate optimal threshold(s) using cross-validation.

Supports both binary and multiclass classification with proper handling of all threshold return formats (scalar, array, dict from expected mode). Uses StratifiedKFold by default for better class balance preservation.

Parameters:
  • true_labs (ArrayLike) – Array of true labels (binary or multiclass).

  • pred_prob (ArrayLike) – Predicted probabilities. For binary: 1D array. For multiclass: 2D array.

  • metric (str, default="f1") – Metric name to optimize; must exist in the metric registry.

  • method (OptimizationMethod, default="auto") – Optimization strategy passed to get_optimal_threshold.

  • cv (int or cross-validator, default=5) – Number of folds or custom cross-validator object.

  • random_state (int, optional) – Seed for the cross-validator shuffling.

  • sample_weight (ArrayLike, optional) – Sample weights for handling imbalanced datasets.

  • comparison (ComparisonOperator, default=">") – Comparison operator for threshold application.

  • average (str, default="macro") – Averaging strategy for multiclass metrics.

  • **opt_kwargs (Any) – Additional arguments passed to get_optimal_threshold.

Returns:

Arrays of per-fold thresholds and scores.

Return type:

tuple[np.ndarray[Any, Any], np.ndarray[Any, Any]]

optimal_cutoffs.cv.nested_cv_threshold_optimization(true_labs: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], pred_prob: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], metric: str = 'f1', method: str = 'auto', inner_cv: int = 5, outer_cv: int = 5, random_state: int | None = None, sample_weight: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, *, comparison: str = '>', average: str = 'macro', **opt_kwargs: Any) tuple[ndarray[Any, Any], ndarray[Any, Any]][source]

Nested cross-validation for unbiased threshold optimization.

Inner CV estimates robust thresholds by averaging across folds, outer CV evaluates performance. Uses StratifiedKFold by default for better class balance. The threshold selection uses statistically sound averaging rather than cherry-picking the best-performing fold.

Parameters:
  • true_labs (ArrayLike) – Array of true labels (binary or multiclass).

  • pred_prob (ArrayLike) – Predicted probabilities. For binary: 1D array. For multiclass: 2D array.

  • metric (str, default="f1") – Metric name to optimize.

  • method (OptimizationMethod, default="auto") – Optimization strategy passed to get_optimal_threshold.

  • inner_cv (int, default=5) – Number of folds in the inner loop used to estimate thresholds.

  • outer_cv (int, default=5) – Number of outer folds for unbiased performance assessment.

  • random_state (int, optional) – Seed for the cross-validators.

  • sample_weight (ArrayLike, optional) – Sample weights for handling imbalanced datasets.

  • comparison (ComparisonOperator, default=">") – Comparison operator for threshold application.

  • average (str, default="macro") – Averaging strategy for multiclass metrics.

  • **opt_kwargs (Any) – Additional arguments passed to get_optimal_threshold.

Returns:

Arrays of outer-fold thresholds and scores.

Return type:

tuple[np.ndarray[Any, Any], np.ndarray[Any, Any]]

Usage Examples

Basic Cross-Validation

from optimal_cutoffs import cv_threshold_optimization
import numpy as np

# Your data
y_true = np.random.randint(0, 2, 1000)
y_prob = np.random.uniform(0, 1, 1000)

# 5-fold cross-validation
thresholds, scores = cv_threshold_optimization(
    y_true, y_prob,
    metric='f1',
    cv=5,
    method='auto'
)

print(f"CV thresholds: {thresholds}")
print(f"CV scores: {scores}")
print(f"Mean threshold: {np.mean(thresholds):.3f} ± {np.std(thresholds):.3f}")

Stratified Cross-Validation

from sklearn.model_selection import StratifiedKFold

# Use stratified splits for imbalanced data
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

thresholds, scores = cv_threshold_optimization(
    y_true, y_prob,
    metric='f1',
    cv=cv,  # Pass custom CV splitter
    method='auto'
)

Nested Cross-Validation

from optimal_cutoffs import nested_cv_threshold_optimization

# Nested CV for unbiased performance estimation
outer_scores, inner_results = nested_cv_threshold_optimization(
    y_true, y_prob,
    metric='f1',
    outer_cv=5,
    inner_cv=3,
    method='auto'
)

print(f"Outer CV scores: {outer_scores}")
print(f"Mean performance: {np.mean(outer_scores):.3f} ± {np.std(outer_scores):.3f}")

Custom Cross-Validation

from sklearn.model_selection import TimeSeriesSplit

# Time series cross-validation
tscv = TimeSeriesSplit(n_splits=5)

thresholds, scores = cv_threshold_optimization(
    y_true, y_prob,
    metric='precision',
    cv=tscv,
    method='smart_brute'
)

With Sample Weights

# Sample weights for imbalanced data
sample_weights = np.where(y_true == 1, 0.5, 2.0)  # Upweight minority class

thresholds, scores = cv_threshold_optimization(
    y_true, y_prob,
    metric='f1',
    cv=5,
    sample_weight=sample_weights
)

Multiclass Cross-Validation

# Multiclass data
y_true_mc = np.random.randint(0, 3, 1000)
y_prob_mc = np.random.dirichlet([1, 1, 1], 1000)  # 3 classes

# Returns list of threshold arrays (one per fold)
thresholds_list, scores = cv_threshold_optimization(
    y_true_mc, y_prob_mc,
    metric='f1',
    cv=5,
    average='macro'  # Macro-averaged F1
)

# Average thresholds across folds
mean_thresholds = np.mean(thresholds_list, axis=0)
print(f"Mean per-class thresholds: {mean_thresholds}")

Best Practices

Choosing CV Strategy

  • Balanced data: Use standard KFold or cv=5

  • Imbalanced data: Use StratifiedKFold to preserve class ratios

  • Time series: Use TimeSeriesSplit to respect temporal order

  • Small datasets: Use LeaveOneOut or higher k in k-fold

Threshold Aggregation

# Multiple strategies for combining CV thresholds
thresholds, scores = cv_threshold_optimization(y_true, y_prob, metric='f1', cv=10)

# Different aggregation methods
mean_threshold = np.mean(thresholds)
median_threshold = np.median(thresholds)

# Weighted by CV scores
weights = scores / np.sum(scores)
weighted_threshold = np.average(thresholds, weights=weights)

# Choose best single fold
best_idx = np.argmax(scores)
best_threshold = thresholds[best_idx]

Uncertainty Quantification

# Bootstrap confidence intervals
from scipy import stats

thresholds, scores = cv_threshold_optimization(y_true, y_prob, metric='f1', cv=10)

# 95% confidence interval for threshold
threshold_mean = np.mean(thresholds)
threshold_se = stats.sem(thresholds)
ci_lower, ci_upper = stats.t.interval(0.95, len(thresholds)-1,
                                     loc=threshold_mean, scale=threshold_se)

print(f"Threshold: {threshold_mean:.3f} [{ci_lower:.3f}, {ci_upper:.3f}]")