API Reference¶
This project provides unified tree estimators that follow the familiar scikit-learn API.
All classes support both regression and classification through a unified task parameter.
Unified Tree Estimators¶
These are the main estimators that work for both regression and classification.
All classes inherit from BaseStableTree and support both task='regression' and task='classification':
- class stable_cart.LessGreedyHybridTree(*args, **kwargs)[source]¶
Bases:
BaseStableTreeLessGreedyHybridTree with unified stability primitives.
Enhanced with cross-method learning: - Winsorization (from RobustPrefix) - Bootstrap consensus for ambiguous splits (from RobustPrefix) - Stratified sampling (from RobustPrefix) - Explicit variance tracking (from Bootstrap)
Core Features: - Honest data partitioning with lookahead beam search - Optional oblique root splits using regularized linear models - Leaf smoothing (shrinkage for regression, m-estimate for classification) - Advanced split selection with multiple strategies
- __init__(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_oblique_splits=True, oblique_strategy='root_only', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=2, beam_width=12, enable_ambiguity_gating=True, ambiguity_threshold=0.05, min_samples_for_lookahead=600, enable_robust_consensus_for_ambiguous=True, consensus_samples=12, consensus_threshold=0.5, enable_threshold_binning=True, max_threshold_bins=24, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_bootstrap_variance_tracking=True, variance_tracking_samples=10, enable_explicit_variance_penalty=False, variance_penalty_weight=0.1, leaf_smoothing=0.0, leaf_smoothing_strategy='shrink_to_parent', enable_gain_margin_logic=True, margin_threshold=0.03, classification_criterion='gini', random_state=None)[source]¶
- property splits_scanned_¶
approximate split count.
- Type:
Backwards compatibility
- property oblique_info_¶
oblique split information.
- Type:
Backwards compatibility
- count_leaves()¶
Count the number of leaf nodes in the tree.
- fit(X, y)¶
Fit the stable tree to the training data.
- predict(X)¶
Predict targets for samples in X.
- predict_proba(X)¶
Predict class probabilities for classification tasks.
- score(X, y)¶
Return the mean accuracy (classification) or R² (regression).
- class stable_cart.BootstrapVariancePenalizedTree(*args, **kwargs)[source]¶
Bases:
BaseStableTreeBootstrap variance penalized tree with unified stability primitives.
Enhanced with cross-method learning: - Stratified bootstraps (from RobustPrefix) - Winsorization (from RobustPrefix) - Threshold binning/bucketing (from RobustPrefix) - Robust consensus mechanism (from RobustPrefix) - Oblique splits (from LessGreedy) - Lookahead (from LessGreedy) - Beam search (from LessGreedy)
Core Features: - Explicit bootstrap variance penalty during split selection - Honest data partitioning for unbiased estimation - Advanced split strategies with variance awareness
- __init__(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, variance_penalty=1.0, n_bootstrap=10, bootstrap_max_depth=2, enable_variance_aware_stopping=True, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_stratified_bootstraps=True, bootstrap_stratification_bins=5, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_threshold_binning=True, max_threshold_bins=24, enable_robust_consensus=True, consensus_samples=12, consensus_threshold=0.5, enable_oblique_splits=True, oblique_strategy='adaptive', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=1, beam_width=8, enable_ambiguity_gating=True, ambiguity_threshold=0.1, min_samples_for_lookahead=100, leaf_smoothing=0.0, leaf_smoothing_strategy='m_estimate', enable_gain_margin_logic=True, margin_threshold=0.03, classification_criterion='gini', random_state=None)[source]¶
- count_leaves()¶
Count the number of leaf nodes in the tree.
- predict(X)¶
Predict targets for samples in X.
- predict_proba(X)¶
Predict class probabilities for classification tasks.
- score(X, y)¶
Return the mean accuracy (classification) or R² (regression).
- class stable_cart.RobustPrefixHonestTree(*args, **kwargs)[source]¶
Bases:
BaseStableTreeRobust prefix honest tree with unified stability primitives.
Enhanced with cross-method learning: - Oblique splits (from LessGreedy): Add Lasso-based oblique splits to locked prefix - Lookahead with beam search (from LessGreedy): Replace depth-1 stumps with k-step - Ambiguity gating (from LessGreedy): Apply consensus only when splits are ambiguous - Correlation gating (from LessGreedy): Check feature correlation before oblique splits - Explicit variance tracking (from Bootstrap): Monitor prediction variance as diagnostic
Core Features: - Robust consensus-based prefix splits with honest leaf estimation - Winsorization for outlier robustness - Stratified honest data partitioning - Advanced consensus mechanisms with threshold binning
- __init__(task='regression', max_depth=6, min_samples_leaf=2, top_levels=2, consensus_samples=12, consensus_threshold=0.5, consensus_subsample_frac=0.8, val_frac=0.2, est_frac=0.4, enable_stratified_sampling=True, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_threshold_binning=True, max_threshold_bins=24, enable_oblique_splits=True, oblique_strategy='root_only', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=2, beam_width=12, enable_beam_search_for_consensus=True, enable_ambiguity_gating=True, ambiguity_threshold=0.05, enable_gain_margin_logic=True, margin_threshold=0.03, enable_bootstrap_variance_tracking=True, variance_tracking_samples=10, enable_explicit_variance_penalty=False, variance_penalty_weight=0.1, smoothing=1.0, leaf_smoothing_strategy='m_estimate', classification_criterion='gini', random_state=None)[source]¶
- count_leaves()¶
Count the number of leaf nodes in the tree.
- predict(X)¶
Predict targets for samples in X.
- predict_proba(X)¶
Predict class probabilities for classification tasks.
- score(X, y)¶
Return the mean accuracy (classification) or R² (regression).
Base Classes¶
For advanced users and researchers who want to extend the functionality or understand the underlying architecture:
- class stable_cart.BaseStableTree(*args, **kwargs)[source]¶
Bases:
BaseEstimatorUnified base class implementing all 7 stability primitives.
The 7 stability primitives are: 1. Prefix stability (robust consensus on early splits) 2. Validation-checked split selection 3. Honesty (separate data for structure vs estimation) 4. Leaf stabilization (shrinkage/smoothing) 5. Data regularization (winsorization, etc.) 6. Candidate diversity with deterministic resolution 7. Variance-aware stopping
All tree methods inherit from this and configure different defaults to maintain their distinct personalities while sharing the unified stability infrastructure.
- __init__(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, enable_honest_estimation=True, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_validation_checking=True, validation_metric='variance_penalized', validation_consistency_weight=1.0, enable_prefix_consensus=False, prefix_levels=2, consensus_samples=12, consensus_threshold=0.5, enable_quantile_grid_thresholds=False, max_threshold_bins=24, leaf_smoothing=0.0, leaf_smoothing_strategy='m_estimate', enable_calibrated_smoothing=False, min_leaf_samples_for_stability=5, enable_winsorization=False, winsor_quantiles=(0.01, 0.99), enable_feature_standardization=False, enable_oblique_splits=False, oblique_strategy='root_only', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=False, lookahead_depth=1, beam_width=8, enable_ambiguity_gating=True, ambiguity_threshold=0.05, min_samples_for_lookahead=100, enable_deterministic_preprocessing=False, enable_deterministic_tiebreaks=True, enable_margin_vetoes=False, margin_threshold=0.03, enable_variance_aware_stopping=False, variance_stopping_weight=1.0, variance_stopping_strategy='variance_penalty', enable_bootstrap_variance_tracking=False, variance_tracking_samples=10, enable_explicit_variance_penalty=False, variance_penalty_weight=0.1, split_strategy=None, algorithm_focus='stability', classification_criterion='gini', random_state=None, enable_threshold_binning=False, enable_gain_margin_logic=False, enable_beam_search_for_consensus=False, enable_robust_consensus_for_ambiguous=False)[source]¶
Evaluation Functions¶
These functions help assess model performance and prediction stability. Use these to compare different tree algorithms or measure the effectiveness of stability features:
- stable_cart.prediction_stability(models, X_oos, task='categorical')[source]¶
Measure how consistent model predictions are across models on the SAME OOS data.
This metric quantifies prediction stability by measuring how much models agree with each other on the same out-of-sample data. Lower values indicate more stable/consistent predictions.
- Parameters:
models (dict[str, fitted_model]) – Mapping of model name -> fitted model (must have .predict() method). Requires at least 2 models.
X_oos (np.ndarray) – Out-of-sample feature matrix to evaluate on.
task ({'categorical', 'continuous'}, default='categorical') – Type of prediction task.
- Returns:
scores – Stability score for each model.
- For ‘categorical’:
Average pairwise DISAGREEMENT rate per model (range: 0-1). Lower is better (more stable). 0 = perfect agreement with all other models.
- For ‘continuous’:
RMSE of each model’s predictions vs the ensemble mean. Lower is better (more stable). 0 = identical to ensemble mean.
- Return type:
dict[str, float]
- Raises:
ValueError – If fewer than 2 models provided, or if task is not ‘categorical’ or ‘continuous’.
Examples
>>> from sklearn.tree import DecisionTreeClassifier >>> from sklearn.model_selection import train_test_split >>> X, y = make_classification(n_samples=100, random_state=42) >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> models = { ... 'tree1': DecisionTreeClassifier(random_state=1).fit(X_train, y_train), ... 'tree2': DecisionTreeClassifier(random_state=2).fit(X_train, y_train), ... } >>> stability = prediction_stability(models, X_test, task='categorical') >>> print(stability) # Lower values = more stable predictions {'tree1': 0.15, 'tree2': 0.15}
Notes
Stability is measured relative to other models in the collection
For categorical tasks, uses pairwise agreement rates
For continuous tasks, uses RMSE to ensemble mean as stability proxy
This metric is complementary to predictive accuracy - a model can be accurate but unstable, or stable but inaccurate
- stable_cart.evaluate_models(models, X, y, task='categorical')[source]¶
Evaluate predictive performance of multiple models using standard metrics.
Computes task-appropriate performance metrics for each model. For classification, includes accuracy and AUC (if predict_proba available). For regression, includes MAE, RMSE, and R².
- Parameters:
models (dict[str, fitted_model]) – Model name -> fitted model mapping. Models must have .predict() method.
X (np.ndarray) – Feature matrix for evaluation.
y (np.ndarray) – Ground-truth labels (classification) or targets (regression).
task ({'categorical', 'continuous'}, default='categorical') – Type of prediction task.
- Returns:
metrics – Nested dictionary: {model_name: {metric_name: value}}
- For ‘categorical’:
’acc’: Classification accuracy (0-1)
- ’auc’: ROC AUC score (0-1, if predict_proba available)
For binary: standard AUC For multi-class: one-vs-rest macro AUC
- For ‘continuous’:
’mae’: Mean Absolute Error (lower is better)
’rmse’: Root Mean Squared Error (lower is better)
’r2’: R² coefficient of determination (-∞ to 1, higher is better)
- Return type:
dict[str, dict[str, float]]
- Raises:
ValueError – If task is not ‘categorical’ or ‘continuous’.
Examples
>>> from sklearn.tree import DecisionTreeRegressor >>> X, y = make_regression(n_samples=100, random_state=42) >>> models = { ... 'shallow': DecisionTreeRegressor(max_depth=3, random_state=42).fit(X, y), ... 'deep': DecisionTreeRegressor(max_depth=10, random_state=42).fit(X, y), ... } >>> performance = evaluate_models(models, X, y, task='continuous') >>> print(performance['shallow']) {'mae': 12.3, 'rmse': 15.7, 'r2': 0.85}
Notes
AUC computation gracefully handles cases where predict_proba is not available
For multi-class classification, uses one-vs-rest strategy for AUC
All metrics use standard sklearn implementations
Consider using separate train/test sets to avoid overfitting bias
Advanced Classes for Researchers¶
Internal classes for advanced customization and research. These provide the building blocks for creating custom stability algorithms:
- class stable_cart.SplitCandidate(feature_idx, threshold, gain, left_indices, right_indices, is_oblique=False, oblique_weights=None, validation_score=None, variance_estimate=None, consensus_support=None)[source]¶
Bases:
objectRepresents a potential split with all relevant information.
- feature_idx: int¶
- threshold: float¶
- gain: float¶
- left_indices: numpy.ndarray¶
- right_indices: numpy.ndarray¶
- is_oblique: bool = False¶
- oblique_weights: numpy.ndarray | None = None¶
- validation_score: float | None = None¶
- variance_estimate: float | None = None¶
- consensus_support: float | None = None¶
- __init__(feature_idx, threshold, gain, left_indices, right_indices, is_oblique=False, oblique_weights=None, validation_score=None, variance_estimate=None, consensus_support=None)¶
- class stable_cart.StabilityMetrics(prefix_consensus_scores, validation_consistency, leaf_variance_estimates, split_margins, bootstrap_variance=None)[source]¶
Bases:
objectContainer for stability diagnostic information.
- prefix_consensus_scores: List[float]¶
- validation_consistency: float¶
- leaf_variance_estimates: List[float]¶
- split_margins: List[float]¶
- bootstrap_variance: float | None = None¶
- __init__(prefix_consensus_scores, validation_consistency, leaf_variance_estimates, split_margins, bootstrap_variance=None)¶
- class stable_cart.SplitStrategy[source]¶
Bases:
ABCAbstract base class for split finding strategies.
- abstractmethod find_best_split(X, y, X_val=None, y_val=None, depth=0, **kwargs)[source]¶
Find the best split for the given data.
- Parameters:
X (np.ndarray) – Training data for structure learning
y (np.ndarray) – Training data for structure learning
X_val (np.ndarray, optional) – Validation data for split evaluation
y_val (np.ndarray, optional) – Validation data for split evaluation
depth (int) – Current depth in the tree
**kwargs – Strategy-specific parameters
- Returns:
best_split – Best split found, or None if no good split exists
- Return type:
SplitCandidate or None
- stable_cart.create_split_strategy(strategy_type, task='regression', **kwargs)[source]¶
Factory function to create split strategies by name.
- Parameters:
strategy_type (str) – Type of strategy: ‘axis_aligned’, ‘consensus’, ‘oblique’, ‘lookahead’, ‘variance_penalized’, ‘composite’, ‘hybrid’
task (str) – ‘regression’ or ‘classification’
**kwargs – Strategy-specific parameters
- Returns:
strategy – Configured split strategy
- Return type: