RobustPrefixHonestTree¶

class RobustPrefixHonestTree(task='regression', max_depth=6, min_samples_leaf=2, top_levels=2, consensus_samples=12, consensus_threshold=0.5, consensus_subsample_frac=0.8, val_frac=0.2, est_frac=0.4, enable_stratified_sampling=True, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_threshold_binning=True, max_threshold_bins=24, enable_oblique_splits=True, oblique_strategy='root_only', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=2, beam_width=12, enable_beam_search_for_consensus=True, enable_ambiguity_gating=True, ambiguity_threshold=0.05, enable_gain_margin_logic=True, margin_threshold=0.03, enable_bootstrap_variance_tracking=True, variance_tracking_samples=10, enable_explicit_variance_penalty=False, variance_penalty_weight=0.1, smoothing=1.0, leaf_smoothing_strategy='m_estimate', classification_criterion='gini', random_state=None)[source]¶

Bases: BaseStableTree

Robust prefix honest tree with unified stability primitives.

Enhanced with cross-method learning: - Oblique splits (from LessGreedy): Add Lasso-based oblique splits to locked prefix - Lookahead with beam search (from LessGreedy): Replace depth-1 stumps with k-step - Ambiguity gating (from LessGreedy): Apply consensus only when splits are ambiguous - Correlation gating (from LessGreedy): Check feature correlation before oblique splits - Explicit variance tracking (from Bootstrap): Monitor prediction variance as diagnostic

Core Features: - Robust consensus-based prefix splits with honest leaf estimation - Winsorization for outlier robustness - Stratified honest data partitioning - Advanced consensus mechanisms with threshold binning

Parameters:

task (Literal['regression', 'classification']) – Prediction task type.
max_depth (int) – Maximum tree depth.
min_samples_leaf (int) – Minimum samples per leaf.
top_levels (int) – Number of prefix levels to lock using robust consensus.
consensus_samples (int) – Number of bootstrap samples for consensus.
consensus_threshold (float) – Threshold for consensus decisions.
consensus_subsample_frac (float) – Subsample fraction per bootstrap.
val_frac (float) – Fraction of data for validation.
est_frac (float) – Fraction of data for estimation.
enable_stratified_sampling (bool) – Enable stratified sampling in data partitioning.
enable_winsorization (bool) – Enable feature winsorization.
winsor_quantiles (tuple[float, float]) – Quantile bounds for winsorization.
enable_threshold_binning (bool) – Enable threshold binning to reduce micro-jitter.
max_threshold_bins (int) – Maximum number of threshold bins.
enable_oblique_splits (bool) – Enable oblique split capability.
oblique_strategy (Literal['root_only', 'all_levels', 'adaptive']) – Strategy for oblique splits.
oblique_regularization (Literal['lasso', 'ridge', 'elastic_net']) – Regularization type for oblique splits.
enable_correlation_gating (bool) – Enable correlation-based feature gating.
min_correlation_threshold (float) – Minimum correlation for feature selection.
enable_lookahead (bool) – Enable lookahead search.
lookahead_depth (int) – Depth for lookahead search.
beam_width (int) – Width of beam search.
enable_beam_search_for_consensus (bool) – Enable beam search for consensus.
enable_ambiguity_gating (bool) – Enable ambiguity-based gating.
ambiguity_threshold (float) – Threshold for ambiguity detection.
enable_gain_margin_logic (bool) – Enable gain margin logic.
margin_threshold (float) – Threshold for margin-based decisions.
enable_bootstrap_variance_tracking (bool) – Enable bootstrap variance tracking.
variance_tracking_samples (int) – Number of samples for variance tracking.
enable_explicit_variance_penalty (bool) – Enable explicit variance penalty.
variance_penalty_weight (float) – Weight for variance penalty.
smoothing (float) – Smoothing parameter for leaf estimates.
leaf_smoothing_strategy (Literal['m_estimate', 'shrink_to_parent']) – Strategy for leaf smoothing.
classification_criterion (Literal['gini', 'entropy']) – Criterion for classification splits.
random_state (int | None) – Random state for reproducibility.

__init__(task='regression', max_depth=6, min_samples_leaf=2, top_levels=2, consensus_samples=12, consensus_threshold=0.5, consensus_subsample_frac=0.8, val_frac=0.2, est_frac=0.4, enable_stratified_sampling=True, enable_winsorization=True, winsor_quantiles=(0.01, 0.99), enable_threshold_binning=True, max_threshold_bins=24, enable_oblique_splits=True, oblique_strategy='root_only', oblique_regularization='lasso', enable_correlation_gating=True, min_correlation_threshold=0.3, enable_lookahead=True, lookahead_depth=2, beam_width=12, enable_beam_search_for_consensus=True, enable_ambiguity_gating=True, ambiguity_threshold=0.05, enable_gain_margin_logic=True, margin_threshold=0.03, enable_bootstrap_variance_tracking=True, variance_tracking_samples=10, enable_explicit_variance_penalty=False, variance_penalty_weight=0.1, smoothing=1.0, leaf_smoothing_strategy='m_estimate', classification_criterion='gini', random_state=None)[source]¶

Methods

`__init__`([task, max_depth, ...])
`fit`(X, y)	Fit with robust prefix consensus.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for sklearn compatibility.
`predict`(X)	Predict targets for samples in X.
`predict_proba`(X)	Predict class probabilities for classification tasks.
`score`(X, y)	Return the mean accuracy (classification) or R² (regression).
`set_params`(**params)	Set parameters for sklearn compatibility.

fit(X, y)[source]¶

Fit with robust prefix consensus.

Parameters:

X (ndarray) – Training features.
y (ndarray) – Training targets.

Returns:

Fitted estimator.

Return type:

RobustPrefixHonestTree

Raises:

ValueError – If multi-class classification is attempted.

get_params(deep=True)[source]¶

Get parameters for sklearn compatibility.

Parameters:: deep (bool) – Whether to return deep parameter copy.
Returns:: Parameter dictionary.
Return type:: dict[str, Any]

set_params(**params)[source]¶

Set parameters for sklearn compatibility.

Parameters:: **params (Any) – Parameter values to set.
Returns:: Self with updated parameters.
Return type:: RobustPrefixHonestTree

classmethod __init_subclass__(**kwargs)¶

Set the set_{method}_request methods.

This uses PEP-487 [1] to set the set_{method}_request methods. It looks for the information available in the set default values which are set using __metadata_request__* class attributes, or inferred from method signatures.

The __metadata_request__* class attributes are used when a method does not explicitly accept a metadata through its arguments or if the developer would like to specify a request value for those metadata which are different from the default None.

References

get_metadata_routing()¶

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

predict(X)¶

Predict targets for samples in X.

Parameters:: X (ndarray[tuple[Any, ...], dtype[floating]]) – Feature matrix of shape (n_samples, n_features).
Returns:: Predicted values of shape (n_samples,).
Return type:: NDArray[Any]
Raises:: ValueError – If the tree has not been fitted.

predict_proba(X)¶

Predict class probabilities for classification tasks.

Parameters:: X (ndarray[tuple[Any, ...], dtype[floating]]) – Feature matrix of shape (n_samples, n_features).
Returns:: Class probabilities of shape (n_samples, n_classes).
Return type:: NDArray[np.floating]
Raises:: ValueError – If called on regression task or tree not fitted.

score(X, y)¶

Return the mean accuracy (classification) or R² (regression).

Parameters:

X (ndarray[tuple[Any, ...], dtype[floating]]) – Feature matrix for evaluation.
y (ndarray[tuple[Any, ...], dtype[Any]]) – True target values.

Returns:

Accuracy for classification, R² for regression.

Return type:

float