API Reference¶

This project provides unified tree estimators that follow the familiar scikit-learn API. All classes support both regression and classification through a unified task parameter.

Unified Tree Estimators¶

These are the main estimators that work for both regression and classification. All classes inherit from BaseStableTree and support both task='regression' and task='classification':

class stable_cart.LessGreedyHybridTree(*args, **kwargs)[source]¶

Bases: BaseStableTree

LessGreedyHybridTree with unified stability primitives.

Enhanced with cross-method learning: - Winsorization (from RobustPrefix) - Bootstrap consensus for ambiguous splits (from RobustPrefix) - Stratified sampling (from RobustPrefix) - Explicit variance tracking (from Bootstrap)

Core Features: - Honest data partitioning with lookahead beam search - Optional oblique root splits using regularized linear models - Leaf smoothing (shrinkage for regression, m-estimate for classification) - Advanced split selection with multiple strategies

__init__(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_oblique_splits=True, oblique_strategy='root_only', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=2, beam_width=12, enable_ambiguity_gating=True, ambiguity_threshold=0.05, min_samples_for_lookahead=600, enable_robust_consensus_for_ambiguous=True, consensus_samples=12, consensus_threshold=0.5, enable_threshold_binning=True, max_threshold_bins=24, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_bootstrap_variance_tracking=True, variance_tracking_samples=10, enable_explicit_variance_penalty=False, variance_penalty_weight=0.1, leaf_smoothing=0.0, leaf_smoothing_strategy='shrink_to_parent', enable_gain_margin_logic=True, margin_threshold=0.03, classification_criterion='gini', random_state=None)[source]¶

get_params(deep=True)[source]¶: Get parameters for sklearn compatibility.

set_params(**params)[source]¶: Set parameters for sklearn compatibility.

property splits_scanned_¶

approximate split count.

Type:: Backwards compatibility

property oblique_info_¶

oblique split information.

Type:: Backwards compatibility

count_leaves()¶: Count the number of leaf nodes in the tree.

fit(X, y)¶: Fit the stable tree to the training data.

predict(X)¶: Predict targets for samples in X.

predict_proba(X)¶: Predict class probabilities for classification tasks.

score(X, y)¶: Return the mean accuracy (classification) or R² (regression).

class stable_cart.BootstrapVariancePenalizedTree(*args, **kwargs)[source]¶

Bases: BaseStableTree

Bootstrap variance penalized tree with unified stability primitives.

Enhanced with cross-method learning: - Stratified bootstraps (from RobustPrefix) - Winsorization (from RobustPrefix) - Threshold binning/bucketing (from RobustPrefix) - Robust consensus mechanism (from RobustPrefix) - Oblique splits (from LessGreedy) - Lookahead (from LessGreedy) - Beam search (from LessGreedy)

Core Features: - Explicit bootstrap variance penalty during split selection - Honest data partitioning for unbiased estimation - Advanced split strategies with variance awareness

__init__(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, variance_penalty=1.0, n_bootstrap=10, bootstrap_max_depth=2, enable_variance_aware_stopping=True, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_stratified_bootstraps=True, bootstrap_stratification_bins=5, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_threshold_binning=True, max_threshold_bins=24, enable_robust_consensus=True, consensus_samples=12, consensus_threshold=0.5, enable_oblique_splits=True, oblique_strategy='adaptive', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=1, beam_width=8, enable_ambiguity_gating=True, ambiguity_threshold=0.1, min_samples_for_lookahead=100, leaf_smoothing=0.0, leaf_smoothing_strategy='m_estimate', enable_gain_margin_logic=True, margin_threshold=0.03, classification_criterion='gini', random_state=None)[source]¶

fit(X, y)[source]¶: Fit with bootstrap variance tracking.

get_params(deep=True)[source]¶: Get parameters for sklearn compatibility.

set_params(**params)[source]¶: Set parameters for sklearn compatibility.

count_leaves()¶: Count the number of leaf nodes in the tree.

predict(X)¶: Predict targets for samples in X.

predict_proba(X)¶: Predict class probabilities for classification tasks.

score(X, y)¶: Return the mean accuracy (classification) or R² (regression).

class stable_cart.RobustPrefixHonestTree(*args, **kwargs)[source]¶

Bases: BaseStableTree

Robust prefix honest tree with unified stability primitives.

Enhanced with cross-method learning: - Oblique splits (from LessGreedy): Add Lasso-based oblique splits to locked prefix - Lookahead with beam search (from LessGreedy): Replace depth-1 stumps with k-step - Ambiguity gating (from LessGreedy): Apply consensus only when splits are ambiguous - Correlation gating (from LessGreedy): Check feature correlation before oblique splits - Explicit variance tracking (from Bootstrap): Monitor prediction variance as diagnostic

Core Features: - Robust consensus-based prefix splits with honest leaf estimation - Winsorization for outlier robustness - Stratified honest data partitioning - Advanced consensus mechanisms with threshold binning

__init__(task='regression', max_depth=6, min_samples_leaf=2, top_levels=2, consensus_samples=12, consensus_threshold=0.5, consensus_subsample_frac=0.8, val_frac=0.2, est_frac=0.4, enable_stratified_sampling=True, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_threshold_binning=True, max_threshold_bins=24, enable_oblique_splits=True, oblique_strategy='root_only', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=2, beam_width=12, enable_beam_search_for_consensus=True, enable_ambiguity_gating=True, ambiguity_threshold=0.05, enable_gain_margin_logic=True, margin_threshold=0.03, enable_bootstrap_variance_tracking=True, variance_tracking_samples=10, enable_explicit_variance_penalty=False, variance_penalty_weight=0.1, smoothing=1.0, leaf_smoothing_strategy='m_estimate', classification_criterion='gini', random_state=None)[source]¶

fit(X, y)[source]¶: Fit with robust prefix consensus.

get_params(deep=True)[source]¶: Get parameters for sklearn compatibility.

set_params(**params)[source]¶: Set parameters for sklearn compatibility.

count_leaves()¶: Count the number of leaf nodes in the tree.

predict(X)¶: Predict targets for samples in X.

predict_proba(X)¶: Predict class probabilities for classification tasks.

score(X, y)¶: Return the mean accuracy (classification) or R² (regression).

Base Classes¶

For advanced users and researchers who want to extend the functionality or understand the underlying architecture:

class stable_cart.BaseStableTree(*args, **kwargs)[source]¶

Bases: BaseEstimator

Unified base class implementing all 7 stability primitives.

The 7 stability primitives are: 1. Prefix stability (robust consensus on early splits) 2. Validation-checked split selection 3. Honesty (separate data for structure vs estimation) 4. Leaf stabilization (shrinkage/smoothing) 5. Data regularization (winsorization, etc.) 6. Candidate diversity with deterministic resolution 7. Variance-aware stopping

All tree methods inherit from this and configure different defaults to maintain their distinct personalities while sharing the unified stability infrastructure.

__init__(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, enable_honest_estimation=True, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_validation_checking=True, validation_metric='variance_penalized', validation_consistency_weight=1.0, enable_prefix_consensus=False, prefix_levels=2, consensus_samples=12, consensus_threshold=0.5, enable_quantile_grid_thresholds=False, max_threshold_bins=24, leaf_smoothing=0.0, leaf_smoothing_strategy='m_estimate', enable_calibrated_smoothing=False, min_leaf_samples_for_stability=5, enable_winsorization=False, winsor_quantiles=(0.01, 0.99), enable_feature_standardization=False, enable_oblique_splits=False, oblique_strategy='root_only', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=False, lookahead_depth=1, beam_width=8, enable_ambiguity_gating=True, ambiguity_threshold=0.05, min_samples_for_lookahead=100, enable_deterministic_preprocessing=False, enable_deterministic_tiebreaks=True, enable_margin_vetoes=False, margin_threshold=0.03, enable_variance_aware_stopping=False, variance_stopping_weight=1.0, variance_stopping_strategy='variance_penalty', enable_bootstrap_variance_tracking=False, variance_tracking_samples=10, enable_explicit_variance_penalty=False, variance_penalty_weight=0.1, split_strategy=None, algorithm_focus='stability', classification_criterion='gini', random_state=None, enable_threshold_binning=False, enable_gain_margin_logic=False, enable_beam_search_for_consensus=False, enable_robust_consensus_for_ambiguous=False)[source]¶

fit(X, y)[source]¶: Fit the stable tree to the training data.

predict(X)[source]¶: Predict targets for samples in X.

predict_proba(X)[source]¶: Predict class probabilities for classification tasks.

score(X, y)[source]¶: Return the mean accuracy (classification) or R² (regression).

count_leaves()[source]¶: Count the number of leaf nodes in the tree.

Evaluation Functions¶

These functions help assess model performance and prediction stability. Use these to compare different tree algorithms or measure the effectiveness of stability features:

stable_cart.prediction_stability(models, X_oos, task='categorical')[source]¶

Measure how consistent model predictions are across models on the SAME OOS data.

This metric quantifies prediction stability by measuring how much models agree with each other on the same out-of-sample data. Lower values indicate more stable/consistent predictions.

Parameters:

models (dict[str, fitted_model]) – Mapping of model name -> fitted model (must have .predict() method). Requires at least 2 models.
X_oos (np.ndarray) – Out-of-sample feature matrix to evaluate on.
task ({'categorical', 'continuous'}, default='categorical') – Type of prediction task.

Returns:

scores – Stability score for each model.

For ‘categorical’:: Average pairwise DISAGREEMENT rate per model (range: 0-1). Lower is better (more stable). 0 = perfect agreement with all other models.
For ‘continuous’:: RMSE of each model’s predictions vs the ensemble mean. Lower is better (more stable). 0 = identical to ensemble mean.

Return type:

dict[str, float]

Raises:

ValueError – If fewer than 2 models provided, or if task is not ‘categorical’ or ‘continuous’.

Examples

>>> from sklearn.tree import DecisionTreeClassifier
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_classification(n_samples=100, random_state=42)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> models = {
...     'tree1': DecisionTreeClassifier(random_state=1).fit(X_train, y_train),
...     'tree2': DecisionTreeClassifier(random_state=2).fit(X_train, y_train),
... }
>>> stability = prediction_stability(models, X_test, task='categorical')
>>> print(stability)  # Lower values = more stable predictions
{'tree1': 0.15, 'tree2': 0.15}

Notes

Stability is measured relative to other models in the collection
For categorical tasks, uses pairwise agreement rates
For continuous tasks, uses RMSE to ensemble mean as stability proxy
This metric is complementary to predictive accuracy - a model can be accurate but unstable, or stable but inaccurate

stable_cart.evaluate_models(models, X, y, task='categorical')[source]¶

Evaluate predictive performance of multiple models using standard metrics.

Computes task-appropriate performance metrics for each model. For classification, includes accuracy and AUC (if predict_proba available). For regression, includes MAE, RMSE, and R².

Parameters:

models (dict[str, fitted_model]) – Model name -> fitted model mapping. Models must have .predict() method.
X (np.ndarray) – Feature matrix for evaluation.
y (np.ndarray) – Ground-truth labels (classification) or targets (regression).
task ({'categorical', 'continuous'}, default='categorical') – Type of prediction task.

Returns:

metrics – Nested dictionary: {model_name: {metric_name: value}}

For ‘categorical’:

’acc’: Classification accuracy (0-1)
’auc’: ROC AUC score (0-1, if predict_proba available)
For binary: standard AUC For multi-class: one-vs-rest macro AUC

For ‘continuous’:

’mae’: Mean Absolute Error (lower is better)
’rmse’: Root Mean Squared Error (lower is better)
’r2’: R² coefficient of determination (-∞ to 1, higher is better)

Return type:

dict[str, dict[str, float]]

Raises:

ValueError – If task is not ‘categorical’ or ‘continuous’.

Examples

>>> from sklearn.tree import DecisionTreeRegressor
>>> X, y = make_regression(n_samples=100, random_state=42)
>>> models = {
...     'shallow': DecisionTreeRegressor(max_depth=3, random_state=42).fit(X, y),
...     'deep': DecisionTreeRegressor(max_depth=10, random_state=42).fit(X, y),
... }
>>> performance = evaluate_models(models, X, y, task='continuous')
>>> print(performance['shallow'])
{'mae': 12.3, 'rmse': 15.7, 'r2': 0.85}

Notes

AUC computation gracefully handles cases where predict_proba is not available
For multi-class classification, uses one-vs-rest strategy for AUC
All metrics use standard sklearn implementations
Consider using separate train/test sets to avoid overfitting bias

Advanced Classes for Researchers¶

Internal classes for advanced customization and research. These provide the building blocks for creating custom stability algorithms:

class stable_cart.SplitCandidate(feature_idx, threshold, gain, left_indices, right_indices, is_oblique=False, oblique_weights=None, validation_score=None, variance_estimate=None, consensus_support=None)[source]¶

Bases: object

Represents a potential split with all relevant information.

feature_idx: int¶

threshold: float¶

gain: float¶

left_indices: numpy.ndarray¶

right_indices: numpy.ndarray¶

is_oblique: bool = False¶

oblique_weights: numpy.ndarray | None = None¶

validation_score: float | None = None¶

variance_estimate: float | None = None¶

consensus_support: float | None = None¶

__init__(feature_idx, threshold, gain, left_indices, right_indices, is_oblique=False, oblique_weights=None, validation_score=None, variance_estimate=None, consensus_support=None)¶

class stable_cart.StabilityMetrics(prefix_consensus_scores, validation_consistency, leaf_variance_estimates, split_margins, bootstrap_variance=None)[source]¶

Bases: object

Container for stability diagnostic information.

prefix_consensus_scores: List[float]¶

validation_consistency: float¶

leaf_variance_estimates: List[float]¶

split_margins: List[float]¶

bootstrap_variance: float | None = None¶

__init__(prefix_consensus_scores, validation_consistency, leaf_variance_estimates, split_margins, bootstrap_variance=None)¶

class stable_cart.SplitStrategy[source]¶

Bases: ABC

Abstract base class for split finding strategies.

abstractmethod find_best_split(X, y, X_val=None, y_val=None, depth=0, **kwargs)[source]¶

Find the best split for the given data.

Parameters:

X (np.ndarray) – Training data for structure learning
y (np.ndarray) – Training data for structure learning
X_val (np.ndarray, optional) – Validation data for split evaluation
y_val (np.ndarray, optional) – Validation data for split evaluation
depth (int) – Current depth in the tree
**kwargs – Strategy-specific parameters

Returns:

best_split – Best split found, or None if no good split exists

Return type:

SplitCandidate or None

abstractmethod should_stop(X, y, current_gain, depth, **kwargs)[source]¶

Determine if splitting should stop at this node.

stable_cart.create_split_strategy(strategy_type, task='regression', **kwargs)[source]¶

Factory function to create split strategies by name.

Parameters:

strategy_type (str) – Type of strategy: ‘axis_aligned’, ‘consensus’, ‘oblique’, ‘lookahead’, ‘variance_penalized’, ‘composite’, ‘hybrid’
task (str) – ‘regression’ or ‘classification’
**kwargs – Strategy-specific parameters

Returns:

strategy – Configured split strategy

Return type:

SplitStrategy