Classification API

Two-stage classification estimator.

class stagecoachml.classification.StagecoachClassifier(stage1_estimator, stage2_estimator, early_features=None, late_features=None, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]

Bases: StagecoachBase, ClassifierMixin

Two-stage classifier for staggered feature arrival.

This estimator handles scenarios where features arrive in batches at different times. It trains a stage1 model on early features and a stage2 model that can use late features plus (optionally) the stage1 prediction.

Parameters:
  • stage1_estimator (estimator) – Sklearn classifier for early features. Must support predict_proba or decision_function for probability estimation.

  • stage2_estimator (estimator) – Sklearn classifier for late features (and optionally stage1 prediction). Must support predict_proba.

  • early_features (list of str, optional) – Column names for early features. If None, uses first half of columns.

  • late_features (list of str, optional) – Column names for late features. If None, uses second half of columns.

  • use_stage1_pred_as_feature (bool, default=True) – If True, stage1 prediction is included as input to stage2.

  • inner_cv (int, optional) – Number of folds for cross-fitting stage1 predictions during training. Helps avoid overfitting when using stage1 predictions as stage2 features.

  • random_state (int, optional) – Random state for reproducibility.

Variables:
  • stage1_estimator (estimator) – Fitted stage1 estimator

  • stage2_estimator (estimator) – Fitted stage2 estimator

  • classes (ndarray of shape (n_classes,)) – Class labels

__init__(stage1_estimator, stage2_estimator, early_features=None, late_features=None, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]
fit(X, y, sample_weight=None)[source]

Fit the two-stage classifier.

Parameters:
  • X (array-like or DataFrame of shape (n_samples, n_features)) – Training data

  • y (array-like of shape (n_samples,)) – Target values

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights

Returns:

self – Fitted estimator

Return type:

object

predict_stage1(X)[source]

Predict classes using only early features (stage1).

Parameters:

X (array-like or DataFrame of shape (n_samples, n_features)) – Input data

Returns:

y_pred – Stage1 class predictions

Return type:

array of shape (n_samples,)

predict_stage1_proba(X)[source]

Predict class probabilities using only early features (stage1).

Parameters:

X (array-like or DataFrame of shape (n_samples, n_features)) – Input data

Returns:

y_proba – Stage1 probability predictions. For binary classification, returns probabilities for the positive class.

Return type:

array of shape (n_samples,) or (n_samples, n_classes)

predict(X)[source]

Predict classes using both stages (full prediction).

Parameters:

X (array-like or DataFrame of shape (n_samples, n_features)) – Input data

Returns:

y_pred – Final class predictions

Return type:

array of shape (n_samples,)

predict_proba(X)[source]

Predict class probabilities using both stages (full prediction).

Parameters:

X (array-like or DataFrame of shape (n_samples, n_features)) – Input data

Returns:

y_proba – Final probability predictions

Return type:

array of shape (n_samples, n_classes)

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

StagecoachClassifier

class stagecoachml.classification.StagecoachClassifier(stage1_estimator, stage2_estimator, early_features=None, late_features=None, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]

Bases: StagecoachBase, ClassifierMixin

Two-stage classifier for staggered feature arrival.

This estimator handles scenarios where features arrive in batches at different times. It trains a stage1 model on early features and a stage2 model that can use late features plus (optionally) the stage1 prediction.

Parameters:
  • stage1_estimator (estimator) – Sklearn classifier for early features. Must support predict_proba or decision_function for probability estimation.

  • stage2_estimator (estimator) – Sklearn classifier for late features (and optionally stage1 prediction). Must support predict_proba.

  • early_features (list of str, optional) – Column names for early features. If None, uses first half of columns.

  • late_features (list of str, optional) – Column names for late features. If None, uses second half of columns.

  • use_stage1_pred_as_feature (bool, default=True) – If True, stage1 prediction is included as input to stage2.

  • inner_cv (int, optional) – Number of folds for cross-fitting stage1 predictions during training. Helps avoid overfitting when using stage1 predictions as stage2 features.

  • random_state (int, optional) – Random state for reproducibility.

Variables:
  • stage1_estimator (estimator) – Fitted stage1 estimator

  • stage2_estimator (estimator) – Fitted stage2 estimator

  • classes (ndarray of shape (n_classes,)) – Class labels

__init__(stage1_estimator, stage2_estimator, early_features=None, late_features=None, use_stage1_pred_as_feature=True, inner_cv=None, random_state=None)[source]
fit(X, y, sample_weight=None)[source]

Fit the two-stage classifier.

Parameters:
  • X (array-like or DataFrame of shape (n_samples, n_features)) – Training data

  • y (array-like of shape (n_samples,)) – Target values

  • sample_weight (array-like of shape (n_samples,), optional) – Sample weights

Returns:

self – Fitted estimator

Return type:

object

predict_stage1(X)[source]

Predict classes using only early features (stage1).

Parameters:

X (array-like or DataFrame of shape (n_samples, n_features)) – Input data

Returns:

y_pred – Stage1 class predictions

Return type:

array of shape (n_samples,)

predict_stage1_proba(X)[source]

Predict class probabilities using only early features (stage1).

Parameters:

X (array-like or DataFrame of shape (n_samples, n_features)) – Input data

Returns:

y_proba – Stage1 probability predictions. For binary classification, returns probabilities for the positive class.

Return type:

array of shape (n_samples,) or (n_samples, n_classes)

predict(X)[source]

Predict classes using both stages (full prediction).

Parameters:

X (array-like or DataFrame of shape (n_samples, n_features)) – Input data

Returns:

y_pred – Final class predictions

Return type:

array of shape (n_samples,)

predict_proba(X)[source]

Predict class probabilities using both stages (full prediction).

Parameters:

X (array-like or DataFrame of shape (n_samples, n_features)) – Input data

Returns:

y_proba – Final probability predictions

Return type:

array of shape (n_samples, n_classes)

set_fit_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

Usage Examples

Basic Usage

from stagecoachml import StagecoachClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

# Load data
data = load_breast_cancer(as_frame=True)
X = data.data
y = data.target

# Split features
features = list(X.columns)
mid = len(features) // 2
early_features = features[:mid]
late_features = features[mid:]

# Create model
model = StagecoachClassifier(
    stage1_estimator=LogisticRegression(max_iter=1000),
    stage2_estimator=RandomForestClassifier(),
    early_features=early_features,
    late_features=late_features,
    use_stage1_pred_as_feature=True,
)

# Train and predict
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)
model.fit(X_train, y_train)

# Get stage-1 probabilities (early features only)
stage1_proba = model.predict_stage1_proba(X_test)

# Get final predictions (all features)
final_pred = model.predict(X_test)
final_proba = model.predict_proba(X_test)