API Reference

This project provides unified tree estimators that follow the familiar scikit-learn API. All classes support both regression and classification through a unified task parameter.

Unified Tree Estimators

These are the main estimators that work for both regression and classification. All classes inherit from BaseStableTree and support both task='regression' and task='classification':

class stable_cart.LessGreedyHybridTree(*args, **kwargs)[source]

Bases: BaseStableTree

LessGreedyHybridTree with unified stability primitives.

Enhanced with cross-method learning: - Winsorization (from RobustPrefix) - Bootstrap consensus for ambiguous splits (from RobustPrefix) - Stratified sampling (from RobustPrefix) - Explicit variance tracking (from Bootstrap)

Core Features: - Honest data partitioning with lookahead beam search - Optional oblique root splits using regularized linear models - Leaf smoothing (shrinkage for regression, m-estimate for classification) - Advanced split selection with multiple strategies

__init__(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_oblique_splits=True, oblique_strategy='root_only', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=2, beam_width=12, enable_ambiguity_gating=True, ambiguity_threshold=0.05, min_samples_for_lookahead=600, enable_robust_consensus_for_ambiguous=True, consensus_samples=12, consensus_threshold=0.5, enable_threshold_binning=True, max_threshold_bins=24, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_bootstrap_variance_tracking=True, variance_tracking_samples=10, enable_explicit_variance_penalty=False, variance_penalty_weight=0.1, leaf_smoothing=0.0, leaf_smoothing_strategy='shrink_to_parent', enable_gain_margin_logic=True, margin_threshold=0.03, classification_criterion='gini', random_state=None)[source]
get_params(deep=True)[source]

Get parameters for sklearn compatibility.

set_params(**params)[source]

Set parameters for sklearn compatibility.

property splits_scanned_

approximate split count.

Type:

Backwards compatibility

property oblique_info_

oblique split information.

Type:

Backwards compatibility

count_leaves()

Count the number of leaf nodes in the tree.

fit(X, y)

Fit the stable tree to the training data.

predict(X)

Predict targets for samples in X.

predict_proba(X)

Predict class probabilities for classification tasks.

score(X, y)

Return the mean accuracy (classification) or R² (regression).

class stable_cart.BootstrapVariancePenalizedTree(*args, **kwargs)[source]

Bases: BaseStableTree

Bootstrap variance penalized tree with unified stability primitives.

Enhanced with cross-method learning: - Stratified bootstraps (from RobustPrefix) - Winsorization (from RobustPrefix) - Threshold binning/bucketing (from RobustPrefix) - Robust consensus mechanism (from RobustPrefix) - Oblique splits (from LessGreedy) - Lookahead (from LessGreedy) - Beam search (from LessGreedy)

Core Features: - Explicit bootstrap variance penalty during split selection - Honest data partitioning for unbiased estimation - Advanced split strategies with variance awareness

__init__(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, variance_penalty=1.0, n_bootstrap=10, bootstrap_max_depth=2, enable_variance_aware_stopping=True, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_stratified_bootstraps=True, bootstrap_stratification_bins=5, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_threshold_binning=True, max_threshold_bins=24, enable_robust_consensus=True, consensus_samples=12, consensus_threshold=0.5, enable_oblique_splits=True, oblique_strategy='adaptive', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=1, beam_width=8, enable_ambiguity_gating=True, ambiguity_threshold=0.1, min_samples_for_lookahead=100, leaf_smoothing=0.0, leaf_smoothing_strategy='m_estimate', enable_gain_margin_logic=True, margin_threshold=0.03, classification_criterion='gini', random_state=None)[source]
fit(X, y)[source]

Fit with bootstrap variance tracking.

get_params(deep=True)[source]

Get parameters for sklearn compatibility.

set_params(**params)[source]

Set parameters for sklearn compatibility.

count_leaves()

Count the number of leaf nodes in the tree.

predict(X)

Predict targets for samples in X.

predict_proba(X)

Predict class probabilities for classification tasks.

score(X, y)

Return the mean accuracy (classification) or R² (regression).

class stable_cart.RobustPrefixHonestTree(*args, **kwargs)[source]

Bases: BaseStableTree

Robust prefix honest tree with unified stability primitives.

Enhanced with cross-method learning: - Oblique splits (from LessGreedy): Add Lasso-based oblique splits to locked prefix - Lookahead with beam search (from LessGreedy): Replace depth-1 stumps with k-step - Ambiguity gating (from LessGreedy): Apply consensus only when splits are ambiguous - Correlation gating (from LessGreedy): Check feature correlation before oblique splits - Explicit variance tracking (from Bootstrap): Monitor prediction variance as diagnostic

Core Features: - Robust consensus-based prefix splits with honest leaf estimation - Winsorization for outlier robustness - Stratified honest data partitioning - Advanced consensus mechanisms with threshold binning

__init__(task='regression', max_depth=6, min_samples_leaf=2, top_levels=2, consensus_samples=12, consensus_threshold=0.5, consensus_subsample_frac=0.8, val_frac=0.2, est_frac=0.4, enable_stratified_sampling=True, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_threshold_binning=True, max_threshold_bins=24, enable_oblique_splits=True, oblique_strategy='root_only', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=2, beam_width=12, enable_beam_search_for_consensus=True, enable_ambiguity_gating=True, ambiguity_threshold=0.05, enable_gain_margin_logic=True, margin_threshold=0.03, enable_bootstrap_variance_tracking=True, variance_tracking_samples=10, enable_explicit_variance_penalty=False, variance_penalty_weight=0.1, smoothing=1.0, leaf_smoothing_strategy='m_estimate', classification_criterion='gini', random_state=None)[source]
fit(X, y)[source]

Fit with robust prefix consensus.

get_params(deep=True)[source]

Get parameters for sklearn compatibility.

set_params(**params)[source]

Set parameters for sklearn compatibility.

count_leaves()

Count the number of leaf nodes in the tree.

predict(X)

Predict targets for samples in X.

predict_proba(X)

Predict class probabilities for classification tasks.

score(X, y)

Return the mean accuracy (classification) or R² (regression).

Base Classes

For advanced users and researchers who want to extend the functionality or understand the underlying architecture:

class stable_cart.BaseStableTree(*args, **kwargs)[source]

Bases: BaseEstimator

Unified base class implementing all 7 stability primitives.

The 7 stability primitives are: 1. Prefix stability (robust consensus on early splits) 2. Validation-checked split selection 3. Honesty (separate data for structure vs estimation) 4. Leaf stabilization (shrinkage/smoothing) 5. Data regularization (winsorization, etc.) 6. Candidate diversity with deterministic resolution 7. Variance-aware stopping

All tree methods inherit from this and configure different defaults to maintain their distinct personalities while sharing the unified stability infrastructure.

__init__(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, enable_honest_estimation=True, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_validation_checking=True, validation_metric='variance_penalized', validation_consistency_weight=1.0, enable_prefix_consensus=False, prefix_levels=2, consensus_samples=12, consensus_threshold=0.5, enable_quantile_grid_thresholds=False, max_threshold_bins=24, leaf_smoothing=0.0, leaf_smoothing_strategy='m_estimate', enable_calibrated_smoothing=False, min_leaf_samples_for_stability=5, enable_winsorization=False, winsor_quantiles=(0.01, 0.99), enable_feature_standardization=False, enable_oblique_splits=False, oblique_strategy='root_only', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=False, lookahead_depth=1, beam_width=8, enable_ambiguity_gating=True, ambiguity_threshold=0.05, min_samples_for_lookahead=100, enable_deterministic_preprocessing=False, enable_deterministic_tiebreaks=True, enable_margin_vetoes=False, margin_threshold=0.03, enable_variance_aware_stopping=False, variance_stopping_weight=1.0, variance_stopping_strategy='variance_penalty', enable_bootstrap_variance_tracking=False, variance_tracking_samples=10, enable_explicit_variance_penalty=False, variance_penalty_weight=0.1, split_strategy=None, algorithm_focus='stability', classification_criterion='gini', random_state=None, enable_threshold_binning=False, enable_gain_margin_logic=False, enable_beam_search_for_consensus=False, enable_robust_consensus_for_ambiguous=False)[source]
fit(X, y)[source]

Fit the stable tree to the training data.

predict(X)[source]

Predict targets for samples in X.

predict_proba(X)[source]

Predict class probabilities for classification tasks.

score(X, y)[source]

Return the mean accuracy (classification) or R² (regression).

count_leaves()[source]

Count the number of leaf nodes in the tree.

Evaluation Functions

These functions help assess model performance and prediction stability. Use these to compare different tree algorithms or measure the effectiveness of stability features:

stable_cart.prediction_stability(models, X_oos, task='categorical')[source]

Measure how consistent model predictions are across models on the SAME OOS data.

This metric quantifies prediction stability by measuring how much models agree with each other on the same out-of-sample data. Lower values indicate more stable/consistent predictions.

Parameters:
  • models (dict[str, fitted_model]) – Mapping of model name -> fitted model (must have .predict() method). Requires at least 2 models.

  • X_oos (np.ndarray) – Out-of-sample feature matrix to evaluate on.

  • task ({'categorical', 'continuous'}, default='categorical') – Type of prediction task.

Returns:

scores – Stability score for each model.

For ‘categorical’:

Average pairwise DISAGREEMENT rate per model (range: 0-1). Lower is better (more stable). 0 = perfect agreement with all other models.

For ‘continuous’:

RMSE of each model’s predictions vs the ensemble mean. Lower is better (more stable). 0 = identical to ensemble mean.

Return type:

dict[str, float]

Raises:

ValueError – If fewer than 2 models provided, or if task is not ‘categorical’ or ‘continuous’.

Examples

>>> from sklearn.tree import DecisionTreeClassifier
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_classification(n_samples=100, random_state=42)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> models = {
...     'tree1': DecisionTreeClassifier(random_state=1).fit(X_train, y_train),
...     'tree2': DecisionTreeClassifier(random_state=2).fit(X_train, y_train),
... }
>>> stability = prediction_stability(models, X_test, task='categorical')
>>> print(stability)  # Lower values = more stable predictions
{'tree1': 0.15, 'tree2': 0.15}

Notes

  • Stability is measured relative to other models in the collection

  • For categorical tasks, uses pairwise agreement rates

  • For continuous tasks, uses RMSE to ensemble mean as stability proxy

  • This metric is complementary to predictive accuracy - a model can be accurate but unstable, or stable but inaccurate

stable_cart.evaluate_models(models, X, y, task='categorical')[source]

Evaluate predictive performance of multiple models using standard metrics.

Computes task-appropriate performance metrics for each model. For classification, includes accuracy and AUC (if predict_proba available). For regression, includes MAE, RMSE, and R².

Parameters:
  • models (dict[str, fitted_model]) – Model name -> fitted model mapping. Models must have .predict() method.

  • X (np.ndarray) – Feature matrix for evaluation.

  • y (np.ndarray) – Ground-truth labels (classification) or targets (regression).

  • task ({'categorical', 'continuous'}, default='categorical') – Type of prediction task.

Returns:

metrics – Nested dictionary: {model_name: {metric_name: value}}

For ‘categorical’:
  • ’acc’: Classification accuracy (0-1)

  • ’auc’: ROC AUC score (0-1, if predict_proba available)

    For binary: standard AUC For multi-class: one-vs-rest macro AUC

For ‘continuous’:
  • ’mae’: Mean Absolute Error (lower is better)

  • ’rmse’: Root Mean Squared Error (lower is better)

  • ’r2’: R² coefficient of determination (-∞ to 1, higher is better)

Return type:

dict[str, dict[str, float]]

Raises:

ValueError – If task is not ‘categorical’ or ‘continuous’.

Examples

>>> from sklearn.tree import DecisionTreeRegressor
>>> X, y = make_regression(n_samples=100, random_state=42)
>>> models = {
...     'shallow': DecisionTreeRegressor(max_depth=3, random_state=42).fit(X, y),
...     'deep': DecisionTreeRegressor(max_depth=10, random_state=42).fit(X, y),
... }
>>> performance = evaluate_models(models, X, y, task='continuous')
>>> print(performance['shallow'])
{'mae': 12.3, 'rmse': 15.7, 'r2': 0.85}

Notes

  • AUC computation gracefully handles cases where predict_proba is not available

  • For multi-class classification, uses one-vs-rest strategy for AUC

  • All metrics use standard sklearn implementations

  • Consider using separate train/test sets to avoid overfitting bias

Advanced Classes for Researchers

Internal classes for advanced customization and research. These provide the building blocks for creating custom stability algorithms:

class stable_cart.SplitCandidate(feature_idx, threshold, gain, left_indices, right_indices, is_oblique=False, oblique_weights=None, validation_score=None, variance_estimate=None, consensus_support=None)[source]

Bases: object

Represents a potential split with all relevant information.

feature_idx: int
threshold: float
gain: float
left_indices: numpy.ndarray
right_indices: numpy.ndarray
is_oblique: bool = False
oblique_weights: numpy.ndarray | None = None
validation_score: float | None = None
variance_estimate: float | None = None
consensus_support: float | None = None
__init__(feature_idx, threshold, gain, left_indices, right_indices, is_oblique=False, oblique_weights=None, validation_score=None, variance_estimate=None, consensus_support=None)
class stable_cart.StabilityMetrics(prefix_consensus_scores, validation_consistency, leaf_variance_estimates, split_margins, bootstrap_variance=None)[source]

Bases: object

Container for stability diagnostic information.

prefix_consensus_scores: List[float]
validation_consistency: float
leaf_variance_estimates: List[float]
split_margins: List[float]
bootstrap_variance: float | None = None
__init__(prefix_consensus_scores, validation_consistency, leaf_variance_estimates, split_margins, bootstrap_variance=None)
class stable_cart.SplitStrategy[source]

Bases: ABC

Abstract base class for split finding strategies.

abstractmethod find_best_split(X, y, X_val=None, y_val=None, depth=0, **kwargs)[source]

Find the best split for the given data.

Parameters:
  • X (np.ndarray) – Training data for structure learning

  • y (np.ndarray) – Training data for structure learning

  • X_val (np.ndarray, optional) – Validation data for split evaluation

  • y_val (np.ndarray, optional) – Validation data for split evaluation

  • depth (int) – Current depth in the tree

  • **kwargs – Strategy-specific parameters

Returns:

best_split – Best split found, or None if no good split exists

Return type:

SplitCandidate or None

abstractmethod should_stop(X, y, current_gain, depth, **kwargs)[source]

Determine if splitting should stop at this node.

stable_cart.create_split_strategy(strategy_type, task='regression', **kwargs)[source]

Factory function to create split strategies by name.

Parameters:
  • strategy_type (str) – Type of strategy: ‘axis_aligned’, ‘consensus’, ‘oblique’, ‘lookahead’, ‘variance_penalized’, ‘composite’, ‘hybrid’

  • task (str) – ‘regression’ or ‘classification’

  • **kwargs – Strategy-specific parameters

Returns:

strategy – Configured split strategy

Return type:

SplitStrategy