Regression API¶
Two-stage regression estimator.
- class stagecoachml.regression.StagecoachRegressor(stage1_estimator, stage2_estimator, early_features=None, late_features=None, residual=True, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]¶
Bases:
StagecoachBase,RegressorMixinTwo-stage regressor for staggered feature arrival.
This estimator handles scenarios where features arrive in batches at different times. It trains a stage1 model on early features and a stage2 model that can use late features plus (optionally) the stage1 prediction.
- Parameters:
stage1_estimator (estimator) – Sklearn regressor for early features
stage2_estimator (estimator) – Sklearn regressor for late features (and optionally stage1 prediction)
early_features (list of str, optional) – Column names for early features. If None, uses first half of columns.
late_features (list of str, optional) – Column names for late features. If None, uses second half of columns.
residual (bool, default=True) – If True, stage2 learns to predict y - stage1_pred (residual). If False, stage2 learns to predict y directly.
use_stage1_pred_as_feature (bool, default=True) – If True, stage1 prediction is included as input to stage2.
inner_cv (int, optional) – Number of folds for cross-fitting stage1 predictions during training. Helps avoid overfitting when using stage1 predictions as stage2 features.
random_state (int, optional) – Random state for reproducibility.
- Variables:
stage1_estimator (estimator) – Fitted stage1 estimator
stage2_estimator (estimator) – Fitted stage2 estimator
- __init__(stage1_estimator, stage2_estimator, early_features=None, late_features=None, residual=True, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]¶
- fit(X, y, sample_weight=None)[source]¶
Fit the two-stage regressor.
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Training data
y (array-like of shape (n_samples,)) – Target values
sample_weight (array-like of shape (n_samples,), optional) – Sample weights
- Returns:
self – Fitted estimator
- Return type:
- predict_stage1(X)[source]¶
Predict using only early features (stage1).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_pred – Stage1 predictions
- Return type:
array of shape (n_samples,)
- predict(X)[source]¶
Predict using both stages (full prediction).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_pred – Final predictions
- Return type:
array of shape (n_samples,)
- set_fit_request(*, sample_weight='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_score_request(*, sample_weight='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
StagecoachRegressor¶
- class stagecoachml.regression.StagecoachRegressor(stage1_estimator, stage2_estimator, early_features=None, late_features=None, residual=True, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]¶
Bases:
StagecoachBase,RegressorMixinTwo-stage regressor for staggered feature arrival.
This estimator handles scenarios where features arrive in batches at different times. It trains a stage1 model on early features and a stage2 model that can use late features plus (optionally) the stage1 prediction.
- Parameters:
stage1_estimator (estimator) – Sklearn regressor for early features
stage2_estimator (estimator) – Sklearn regressor for late features (and optionally stage1 prediction)
early_features (list of str, optional) – Column names for early features. If None, uses first half of columns.
late_features (list of str, optional) – Column names for late features. If None, uses second half of columns.
residual (bool, default=True) – If True, stage2 learns to predict y - stage1_pred (residual). If False, stage2 learns to predict y directly.
use_stage1_pred_as_feature (bool, default=True) – If True, stage1 prediction is included as input to stage2.
inner_cv (int, optional) – Number of folds for cross-fitting stage1 predictions during training. Helps avoid overfitting when using stage1 predictions as stage2 features.
random_state (int, optional) – Random state for reproducibility.
- Variables:
stage1_estimator (estimator) – Fitted stage1 estimator
stage2_estimator (estimator) – Fitted stage2 estimator
- __init__(stage1_estimator, stage2_estimator, early_features=None, late_features=None, residual=True, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]¶
- fit(X, y, sample_weight=None)[source]¶
Fit the two-stage regressor.
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Training data
y (array-like of shape (n_samples,)) – Target values
sample_weight (array-like of shape (n_samples,), optional) – Sample weights
- Returns:
self – Fitted estimator
- Return type:
- predict_stage1(X)[source]¶
Predict using only early features (stage1).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_pred – Stage1 predictions
- Return type:
array of shape (n_samples,)
- predict(X)[source]¶
Predict using both stages (full prediction).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_pred – Final predictions
- Return type:
array of shape (n_samples,)
- set_fit_request(*, sample_weight='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_score_request(*, sample_weight='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Usage Examples¶
Basic Usage¶
from stagecoachml import StagecoachRegressor
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
# Load data
diabetes = load_diabetes(as_frame=True)
X = diabetes.frame.drop(columns=["target"])
y = diabetes.frame["target"]
# Split features
features = list(X.columns)
mid = len(features) // 2
early_features = features[:mid]
late_features = features[mid:]
# Create model
model = StagecoachRegressor(
stage1_estimator=LinearRegression(),
stage2_estimator=RandomForestRegressor(),
early_features=early_features,
late_features=late_features,
residual=True,
use_stage1_pred_as_feature=True,
)
# Train and predict
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model.fit(X_train, y_train)
# Get stage-1 predictions (early features only)
stage1_pred = model.predict_stage1(X_test)
# Get final predictions (all features)
final_pred = model.predict(X_test)