BootstrapVariancePenalizedTreeΒΆ

class BootstrapVariancePenalizedTree(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, variance_penalty=1.0, n_bootstrap=10, bootstrap_max_depth=2, enable_variance_aware_stopping=True, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_stratified_bootstraps=True, bootstrap_stratification_bins=5, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_threshold_binning=True, max_threshold_bins=24, enable_robust_consensus=True, consensus_samples=12, consensus_threshold=0.5, enable_oblique_splits=True, oblique_strategy='adaptive', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=1, beam_width=8, enable_ambiguity_gating=True, ambiguity_threshold=0.1, min_samples_for_lookahead=100, leaf_smoothing=0.0, leaf_smoothing_strategy='m_estimate', enable_gain_margin_logic=True, margin_threshold=0.03, classification_criterion='gini', random_state=None)[source]ΒΆ

Bases: BaseStableTree

Bootstrap variance penalized tree with unified stability primitives.

Enhanced with cross-method learning: - Stratified bootstraps (from RobustPrefix) - Winsorization (from RobustPrefix) - Threshold binning/bucketing (from RobustPrefix) - Robust consensus mechanism (from RobustPrefix) - Oblique splits (from LessGreedy) - Lookahead (from LessGreedy) - Beam search (from LessGreedy)

Core Features: - Explicit bootstrap variance penalty during split selection - Honest data partitioning for unbiased estimation - Advanced split strategies with variance awareness

Parameters:
  • task (Literal['regression', 'classification']) – Prediction task type.

  • max_depth (int) – Maximum tree depth.

  • min_samples_split (int) – Minimum samples to split a node.

  • min_samples_leaf (int) – Minimum samples per leaf.

  • variance_penalty (float) – Weight for bootstrap variance penalty.

  • n_bootstrap (int) – Number of bootstrap samples for variance estimation.

  • bootstrap_max_depth (int) – Maximum depth for variance estimation trees.

  • enable_variance_aware_stopping (bool) – Enable variance-aware stopping criteria.

  • split_frac (float) – Fraction of data for structure building.

  • val_frac (float) – Fraction of data for validation.

  • est_frac (float) – Fraction of data for estimation.

  • enable_stratified_sampling (bool) – Enable stratified sampling in data partitioning.

  • enable_stratified_bootstraps (bool) – Enable target-stratified bootstrap sampling.

  • bootstrap_stratification_bins (int) – Number of bins for regression quantile stratification.

  • enable_winsorization (bool) – Enable feature winsorization before bootstrap sampling.

  • winsor_quantiles (tuple) – Quantile bounds for winsorization.

  • enable_threshold_binning (bool) – Enable threshold binning to reduce micro-jitter.

  • max_threshold_bins (int) – Maximum number of threshold bins.

  • enable_robust_consensus (bool) – Enable robust consensus mechanism.

  • consensus_samples (int) – Number of samples for consensus.

  • consensus_threshold (float) – Threshold for consensus decisions.

  • enable_oblique_splits (bool) – Enable oblique split capability.

  • oblique_strategy (Literal['root_only', 'all_levels', 'adaptive']) – Strategy for oblique splits.

  • oblique_regularization (Literal['lasso', 'ridge', 'elastic_net']) – Regularization type for oblique splits.

  • enable_correlation_gating (bool) – Enable correlation-based feature gating.

  • min_correlation_threshold (float) – Minimum correlation for feature selection.

  • enable_lookahead (bool) – Enable lookahead search.

  • lookahead_depth (int) – Depth for lookahead search.

  • beam_width (int) – Width of beam search.

  • enable_ambiguity_gating (bool) – Enable ambiguity-based gating.

  • ambiguity_threshold (float) – Threshold for ambiguity detection.

  • min_samples_for_lookahead (int) – Minimum samples required for lookahead.

  • leaf_smoothing (float) – Smoothing parameter for leaf estimates.

  • leaf_smoothing_strategy (Literal['m_estimate', 'shrink_to_parent']) – Strategy for leaf smoothing.

  • enable_gain_margin_logic (bool) – Enable gain margin logic.

  • margin_threshold (float) – Threshold for margin-based decisions.

  • classification_criterion (Literal['gini', 'entropy']) – Criterion for classification splits.

  • random_state (int | None) – Random state for reproducibility.

__init__(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, variance_penalty=1.0, n_bootstrap=10, bootstrap_max_depth=2, enable_variance_aware_stopping=True, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_stratified_bootstraps=True, bootstrap_stratification_bins=5, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_threshold_binning=True, max_threshold_bins=24, enable_robust_consensus=True, consensus_samples=12, consensus_threshold=0.5, enable_oblique_splits=True, oblique_strategy='adaptive', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=1, beam_width=8, enable_ambiguity_gating=True, ambiguity_threshold=0.1, min_samples_for_lookahead=100, leaf_smoothing=0.0, leaf_smoothing_strategy='m_estimate', enable_gain_margin_logic=True, margin_threshold=0.03, classification_criterion='gini', random_state=None)[source]ΒΆ

Methods

__init__([task, max_depth, ...])

fit(X, y)

Fit with bootstrap variance tracking.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for sklearn compatibility.

predict(X)

Predict targets for samples in X.

predict_proba(X)

Predict class probabilities for classification tasks.

score(X, y)

Return the mean accuracy (classification) or RΒ² (regression).

set_params(**params)

Set parameters for sklearn compatibility.

__init__(task='regression', max_depth=5, min_samples_split=40, min_samples_leaf=20, variance_penalty=1.0, n_bootstrap=10, bootstrap_max_depth=2, enable_variance_aware_stopping=True, split_frac=0.6, val_frac=0.2, est_frac=0.2, enable_stratified_sampling=True, enable_stratified_bootstraps=True, bootstrap_stratification_bins=5, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_threshold_binning=True, max_threshold_bins=24, enable_robust_consensus=True, consensus_samples=12, consensus_threshold=0.5, enable_oblique_splits=True, oblique_strategy='adaptive', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=1, beam_width=8, enable_ambiguity_gating=True, ambiguity_threshold=0.1, min_samples_for_lookahead=100, leaf_smoothing=0.0, leaf_smoothing_strategy='m_estimate', enable_gain_margin_logic=True, margin_threshold=0.03, classification_criterion='gini', random_state=None)[source]ΒΆ
fit(X, y)[source]ΒΆ

Fit with bootstrap variance tracking.

Parameters:
  • X (ndarray) – Training features.

  • y (ndarray) – Training targets.

Returns:

Fitted estimator.

Return type:

BootstrapVariancePenalizedTree

get_params(deep=True)[source]ΒΆ

Get parameters for sklearn compatibility.

Parameters:

deep (bool) – Whether to return deep parameter copy.

Returns:

Parameter dictionary.

Return type:

dict[str, Any]

set_params(**params)[source]ΒΆ

Set parameters for sklearn compatibility.

Parameters:

**params (Any) – Parameter values to set.

Returns:

Self with updated parameters.

Return type:

BootstrapVariancePenalizedTree

classmethod __init_subclass__(**kwargs)ΒΆ

Set the set_{method}_request methods.

This uses PEP-487 [1] to set the set_{method}_request methods. It looks for the information available in the set default values which are set using __metadata_request__* class attributes, or inferred from method signatures.

The __metadata_request__* class attributes are used when a method does not explicitly accept a metadata through its arguments or if the developer would like to specify a request value for those metadata which are different from the default None.

References

get_metadata_routing()ΒΆ

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

predict(X)ΒΆ

Predict targets for samples in X.

Parameters:

X (ndarray[tuple[Any, ...], dtype[floating]]) – Feature matrix of shape (n_samples, n_features).

Returns:

Predicted values of shape (n_samples,).

Return type:

NDArray[Any]

Raises:

ValueError – If the tree has not been fitted.

predict_proba(X)ΒΆ

Predict class probabilities for classification tasks.

Parameters:

X (ndarray[tuple[Any, ...], dtype[floating]]) – Feature matrix of shape (n_samples, n_features).

Returns:

Class probabilities of shape (n_samples, n_classes).

Return type:

NDArray[np.floating]

Raises:

ValueError – If called on regression task or tree not fitted.

score(X, y)ΒΆ

Return the mean accuracy (classification) or RΒ² (regression).

Parameters:
  • X (ndarray[tuple[Any, ...], dtype[floating]]) – Feature matrix for evaluation.

  • y (ndarray[tuple[Any, ...], dtype[Any]]) – True target values.

Returns:

Accuracy for classification, RΒ² for regression.

Return type:

float