Quick Start¶
This guide will get you up and running with optimal-classification-cutoffs in just a few minutes.
Basic Binary Classification¶
The simplest use case is finding the optimal threshold for binary classification:
from optimal_cutoffs import optimize_thresholds
import numpy as np
# Your binary classification data
y_true = np.array([0, 0, 1, 1, 0, 1])
y_prob = np.array([0.1, 0.4, 0.35, 0.8, 0.2, 0.9])
# Find optimal threshold for F1 score
result = optimize_thresholds(y_true, y_prob, metric='f1')
print(f"Optimal threshold: {result.threshold:.3f}")
# Make predictions
predictions = result.predict(y_prob)
print(f"Predictions: {predictions}")
Other Metrics¶
You can optimize for different metrics:
# Optimize for accuracy
result_acc = optimize_thresholds(y_true, y_prob, metric='accuracy')
# Optimize for precision
result_prec = optimize_thresholds(y_true, y_prob, metric='precision')
# Optimize for recall
result_rec = optimize_thresholds(y_true, y_prob, metric='recall')
print(f"Accuracy threshold: {result_acc.threshold:.3f}")
print(f"Precision threshold: {result_prec.threshold:.3f}")
print(f"Recall threshold: {result_rec.threshold:.3f}")
Multiclass Classification¶
For multiclass problems, the library automatically detects the problem type and returns per-class thresholds:
# Multiclass example with 3 classes
y_true = np.array([0, 1, 2, 0, 1, 2])
y_prob = np.array([
[0.7, 0.2, 0.1], # Sample 1: likely class 0
[0.1, 0.8, 0.1], # Sample 2: likely class 1
[0.1, 0.1, 0.8], # Sample 3: likely class 2
[0.6, 0.3, 0.1], # Sample 4: likely class 0
[0.2, 0.7, 0.1], # Sample 5: likely class 1
[0.1, 0.2, 0.7] # Sample 6: likely class 2
])
# Get per-class optimal thresholds
result = optimize_thresholds(y_true, y_prob, metric='f1')
print(f"Optimal thresholds per class: {result.thresholds}")
print(f"Task detected: {result.task.value}")
# Make predictions
predictions = result.predict(y_prob)
Progressive Disclosure: Power Tools¶
API 2.0.0 uses progressive disclosure - simple for basic use, powerful when needed:
from optimal_cutoffs import optimize_thresholds, cv, metrics
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
# Generate sample data
X = np.random.randn(1000, 5)
y = (X[:, 0] + X[:, 1] > 0).astype(int)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train a classifier
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)
y_prob_train = clf.predict_proba(X_train)[:, 1]
y_prob_test = clf.predict_proba(X_test)[:, 1]
# Simple: Optimize threshold
result = optimize_thresholds(y_train, y_prob_train, metric='f1')
y_pred = result.predict(y_prob_test)
print(f"Optimal threshold: {result.threshold:.3f}")
print(f"Test accuracy: {np.mean(y_pred == y_test):.3f}")
print(f"Method used: {result.method}")
# Advanced: Cross-validation with threshold tuning
cv_scores = cv.cross_validate(clf, X, y, metric='f1')
Optimization Methods¶
The library provides several optimization methods:
# Auto method selection (recommended)
result = optimize_thresholds(y_true, y_prob, metric='f1', method='auto')
# Fast O(n log n) algorithm for piecewise metrics
result = optimize_thresholds(y_true, y_prob, metric='f1', method='sort_scan')
# Scipy-based continuous optimization
result = optimize_thresholds(y_true, y_prob, metric='f1', method='minimize')
# Explainable auto-selection
print(f"Method selected: {result.method}")
print(f"Reasoning: {result.notes}")
Cost-Sensitive Optimization¶
For applications where different types of errors have different costs:
from optimal_cutoffs import optimize_decisions, bayes
# Option 1: Use cost matrix (no thresholds needed)
cost_matrix = [[0, 1], [5, 0]] # FN costs 5x more than FP
result = optimize_decisions(y_prob, cost_matrix)
predictions = result.predict(y_prob)
# Option 2: Bayes-optimal threshold calculation
threshold = bayes.threshold(cost_fp=1.0, cost_fn=5.0)
print(f"Bayes-optimal threshold: {threshold:.3f}") # = 1/(1+5) = 0.167
Next Steps¶
Read the User Guide for detailed explanations and advanced features
Check out Examples for more comprehensive examples
Explore Advanced Topics topics like cross-validation and custom metrics
Understand the Theory and Background behind why this approach works better than standard methods