Classification API¶
Two-stage classification estimator.
- class stagecoachml.classification.StagecoachClassifier(stage1_estimator, stage2_estimator, early_features=None, late_features=None, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]¶
Bases:
StagecoachBase,ClassifierMixinTwo-stage classifier for staggered feature arrival.
This estimator handles scenarios where features arrive in batches at different times. It trains a stage1 model on early features and a stage2 model that can use late features plus (optionally) the stage1 prediction.
- Parameters:
stage1_estimator (estimator) – Sklearn classifier for early features. Must support predict_proba or decision_function for probability estimation.
stage2_estimator (estimator) – Sklearn classifier for late features (and optionally stage1 prediction). Must support predict_proba.
early_features (list of str, optional) – Column names for early features. If None, uses first half of columns.
late_features (list of str, optional) – Column names for late features. If None, uses second half of columns.
use_stage1_pred_as_feature (bool, default=True) – If True, stage1 prediction is included as input to stage2.
inner_cv (int, optional) – Number of folds for cross-fitting stage1 predictions during training. Helps avoid overfitting when using stage1 predictions as stage2 features.
random_state (int, optional) – Random state for reproducibility.
- Variables:
stage1_estimator (estimator) – Fitted stage1 estimator
stage2_estimator (estimator) – Fitted stage2 estimator
classes (ndarray of shape (n_classes,)) – Class labels
- __init__(stage1_estimator, stage2_estimator, early_features=None, late_features=None, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]¶
- fit(X, y, sample_weight=None)[source]¶
Fit the two-stage classifier.
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Training data
y (array-like of shape (n_samples,)) – Target values
sample_weight (array-like of shape (n_samples,), optional) – Sample weights
- Returns:
self – Fitted estimator
- Return type:
- predict_stage1(X)[source]¶
Predict classes using only early features (stage1).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_pred – Stage1 class predictions
- Return type:
array of shape (n_samples,)
- predict_stage1_proba(X)[source]¶
Predict class probabilities using only early features (stage1).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_proba – Stage1 probability predictions. For binary classification, returns probabilities for the positive class.
- Return type:
array of shape (n_samples,) or (n_samples, n_classes)
- predict(X)[source]¶
Predict classes using both stages (full prediction).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_pred – Final class predictions
- Return type:
array of shape (n_samples,)
- predict_proba(X)[source]¶
Predict class probabilities using both stages (full prediction).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_proba – Final probability predictions
- Return type:
array of shape (n_samples, n_classes)
- set_fit_request(*, sample_weight='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_score_request(*, sample_weight='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
StagecoachClassifier¶
- class stagecoachml.classification.StagecoachClassifier(stage1_estimator, stage2_estimator, early_features=None, late_features=None, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]¶
Bases:
StagecoachBase,ClassifierMixinTwo-stage classifier for staggered feature arrival.
This estimator handles scenarios where features arrive in batches at different times. It trains a stage1 model on early features and a stage2 model that can use late features plus (optionally) the stage1 prediction.
- Parameters:
stage1_estimator (estimator) – Sklearn classifier for early features. Must support predict_proba or decision_function for probability estimation.
stage2_estimator (estimator) – Sklearn classifier for late features (and optionally stage1 prediction). Must support predict_proba.
early_features (list of str, optional) – Column names for early features. If None, uses first half of columns.
late_features (list of str, optional) – Column names for late features. If None, uses second half of columns.
use_stage1_pred_as_feature (bool, default=True) – If True, stage1 prediction is included as input to stage2.
inner_cv (int, optional) – Number of folds for cross-fitting stage1 predictions during training. Helps avoid overfitting when using stage1 predictions as stage2 features.
random_state (int, optional) – Random state for reproducibility.
- Variables:
stage1_estimator (estimator) – Fitted stage1 estimator
stage2_estimator (estimator) – Fitted stage2 estimator
classes (ndarray of shape (n_classes,)) – Class labels
- __init__(stage1_estimator, stage2_estimator, early_features=None, late_features=None, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]¶
- fit(X, y, sample_weight=None)[source]¶
Fit the two-stage classifier.
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Training data
y (array-like of shape (n_samples,)) – Target values
sample_weight (array-like of shape (n_samples,), optional) – Sample weights
- Returns:
self – Fitted estimator
- Return type:
- predict_stage1(X)[source]¶
Predict classes using only early features (stage1).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_pred – Stage1 class predictions
- Return type:
array of shape (n_samples,)
- predict_stage1_proba(X)[source]¶
Predict class probabilities using only early features (stage1).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_proba – Stage1 probability predictions. For binary classification, returns probabilities for the positive class.
- Return type:
array of shape (n_samples,) or (n_samples, n_classes)
- predict(X)[source]¶
Predict classes using both stages (full prediction).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_pred – Final class predictions
- Return type:
array of shape (n_samples,)
- predict_proba(X)[source]¶
Predict class probabilities using both stages (full prediction).
- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input data
- Returns:
y_proba – Final probability predictions
- Return type:
array of shape (n_samples, n_classes)
- set_fit_request(*, sample_weight='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- set_score_request(*, sample_weight='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Usage Examples¶
Basic Usage¶
from stagecoachml import StagecoachClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
# Load data
data = load_breast_cancer(as_frame=True)
X = data.data
y = data.target
# Split features
features = list(X.columns)
mid = len(features) // 2
early_features = features[:mid]
late_features = features[mid:]
# Create model
model = StagecoachClassifier(
stage1_estimator=LogisticRegression(max_iter=1000),
stage2_estimator=RandomForestClassifier(),
early_features=early_features,
late_features=late_features,
use_stage1_pred_as_feature=True,
)
# Train and predict
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)
model.fit(X_train, y_train)
# Get stage-1 probabilities (early features only)
stage1_proba = model.predict_stage1_proba(X_test)
# Get final predictions (all features)
final_pred = model.predict(X_test)
final_proba = model.predict_proba(X_test)