StagecoachML Documentation¶
StagecoachML¶
StagecoachML is a tiny library for building two-stage models when your features arrive in two batches at different times.
Think:
Ad serving and recommendation:
first score on user + context, then refine on creative/item + real-time signals.Per-customer privacy:
a shared non-sensitive trunk, plus a per-customer head that uses private fields inside their own environment.Latency-sensitive inference:
run a fast stage-1 model early in the request, and only run the heavier stage-2 model when needed.
StagecoachML encodes that pattern directly in the model interface instead of leaving it buried in infra and notebooks.
When should you use StagecoachML?¶
Use StagecoachML when:
You can’t wait for all features before you have to start making decisions.
Some features live in a different silo (e.g. customer’s infra) and must never hit the central model.
You want to tune and evaluate the whole two-stage system as a single estimator (train/test/CV), while still being able to:
get stage-1 scores from early features, and
get refined scores once late features arrive.
If you have all your features at once and a single model is fine, this library is probably overkill. But if you live with staggered features, StagecoachML keeps the logic honest.
Core idea¶
A StagecoachML model splits features into two groups:
Early features: available at stage 1 (e.g. user, context).
Late features: only available at stage 2 (e.g. ad/creative/item, customer-side data).
You choose:
a stage-1 estimator that sees only early features, and
a stage-2 estimator that sees late features plus (optionally) the stage-1 prediction, and either:
learns to predict the residual
y − ŷ₁, orlearns the final target directly.
At inference time you can:
call
predict_stage1(...)/predict_stage1_proba(...)when you only have early features; andcall
predict(...)/predict_proba(...)later when you have both.
Under the hood, you still train and cross-validate it like any other sklearn estimator.
Installation¶
StagecoachML is a pure Python package that depends on NumPy, pandas, and scikit-learn.
pip install stagecoachml
Or install from source:
git clone https://github.com/finite-sample/stagecoachml.git
cd stagecoachml
pip install -e .
Import the estimators:
from stagecoachml import StagecoachRegressor, StagecoachClassifier
Quick start¶
Regression example (diabetes dataset)¶
from stagecoachml import StagecoachRegressor
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
# Load data as a DataFrame
diabetes = load_diabetes(as_frame=True)
X = diabetes.frame.drop(columns=["target"])
y = diabetes.frame["target"]
# Split columns into "early" and "late" features
features = list(X.columns)
mid = len(features) // 2
early_features = features[:mid] # pretend these arrive early
late_features = features[mid:] # pretend these arrive later
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=0
)
# Stage-1: fast global model on early features
stage1 = LinearRegression()
# Stage-2: more flexible model on late features + stage-1 prediction
stage2 = RandomForestRegressor(n_estimators=200, random_state=0)
model = StagecoachRegressor(
stage1_estimator=stage1,
stage2_estimator=stage2,
early_features=early_features,
late_features=late_features,
residual=True,
use_stage1_pred_as_feature=True,
inner_cv=None, # set >1 to cross-fit stage-1 preds if you care
)
# Hyper-parameter search over both stages
param_grid = {
"stage1_estimator__fit_intercept": [True, False],
"stage2_estimator__max_depth": [None, 5, 10],
}
grid = GridSearchCV(model, param_grid, cv=5)
grid.fit(X_train, y_train)
best = grid.best_estimator_
print("Stage-1 test R²: ", r2_score(y_test, best.predict_stage1(X_test)))
print("Final test R²: ", r2_score(y_test, best.predict(X_test)))
Classification example (breast cancer dataset)¶
from stagecoachml import StagecoachClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
data = load_breast_cancer(as_frame=True)
X = data.data
y = data.target
features = list(X.columns)
mid = len(features) // 2
early = features[:mid]
late = features[mid:]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=1, stratify=y
)
stage1_clf = LogisticRegression(max_iter=1000)
stage2_clf = RandomForestClassifier(n_estimators=200, random_state=2)
model = StagecoachClassifier(
stage1_estimator=stage1_clf,
stage2_estimator=stage2_clf,
early_features=early,
late_features=late,
use_stage1_pred_as_feature=True,
)
model.fit(X_train, y_train)
def metrics(y_true, y_pred):
return accuracy_score(y_true, y_pred), f1_score(y_true, y_pred)
# Provisional scores from early features only
stage1_test_proba = model.predict_stage1_proba(X_test)
stage1_acc, stage1_f1 = metrics(y_test, (stage1_test_proba >= 0.5).astype(int))
# Final scores with all features
final_acc, final_f1 = metrics(y_test, model.predict(X_test))
print("Stage-1 test accuracy/F1:", f"{stage1_acc:.3f}/{stage1_f1:.3f}")
print("Final test accuracy/F1:", f"{final_acc:.3f}/{final_f1:.3f}")
API overview¶
StagecoachRegressor¶
StagecoachRegressor(
stage1_estimator,
stage2_estimator,
early_features,
late_features,
residual=True,
use_stage1_pred_as_feature=True,
inner_cv=None,
random_state=None,
)
Key points:
stage1_estimator: any sklearn regressor (RandomForestRegressor,LinearRegression, etc.).stage2_estimator: another regressor for the late features (often more flexible).early_features/late_features: column names defining feature arrival.residual=True: stage 2 learnsy − ŷ₁and we add it back at prediction time.use_stage1_pred_as_feature=True: stage-1 prediction becomes an extra input to stage 2.inner_cv: optional K-fold cross-fitting to generate out-of-fold stage-1 predictions for stage-2 training.
Methods:
fit(X, y)predict_stage1(X)– early-only predictions.predict(X)– final predictions.
StagecoachClassifier¶
StagecoachClassifier(
stage1_estimator,
stage2_estimator,
early_features,
late_features,
use_stage1_pred_as_feature=True,
inner_cv=None,
random_state=None,
)
Stage-1 classifier must implement
predict_probaordecision_function.Stage-2 classifier must implement
predict_proba.predict_stage1_proba(X)returns a provisional probability for the positive class using early features only.predict_proba(X)/predict(X)use both stages.
Business-level use cases¶
1. Ad serving & recommendation¶
Stage 1 (trunk): user, session, page/context features. Run for every candidate to do rough scoring / candidate pruning.
Stage 2 (head): ad/creative/item-side features (embeddings, textual features, sponsorship info), plus stage-1 scores. Run only on the smaller candidate set.
This lets you:
keep the expensive features and models off the critical path where possible,
cross-validate the whole two-stage scoring process as one estimator, and
reason explicitly about which features are actually available at each stage.
2. Per-customer models with private fields¶
Shared trunk: trained on non-sensitive features across all customers.
Per-customer head (stage 2): trained only on that customer’s private fields (GDP data, custom risk scores, internal labels) inside their environment.
You can:
ship the trunk once,
let each customer fit their own stage-2 model locally,
still evaluate how “global trunk + local head” behaves on held-out data.
3. Latency and staged inference¶
If your system has a front-door budget (say ~10 ms) and a back-end budget per selected candidate, StagecoachML gives you a clean way to:
do rough scoring at T₁ using a small, cheap stage-1 model;
hydrate more features or call heavier services; and
refine scores at T₂ with stage-2.
Because the whole pipeline is an sklearn estimator, you don’t have to guess whether this staging actually helps: you can compare two-stage vs single-stage models on the same train/test splits.
Examples¶
The examples/ directory contains runnable scripts:
examples/regression_example.pyUses the diabetes dataset, splits features into early/late, trains aStagecoachRegressor, and compares it to a one-stage baseline.examples/classification_example.pyUses the breast cancer dataset, trains aStagecoachClassifier, and compares provisional vs final predictions and a one-stage logistic baseline.
Run them with:
python -m examples.regression_example
python -m examples.classification_example
Design notes & non-goals¶
Treat
Stagecoach*as one model for train/validation/test; don’t hand-tune stages in isolation and then try to glue them.inner_cvis an optional extra for robustness, not a replacement for normal cross-validation.This library is not a general DAG/workflow engine. If you want full pipeline orchestration (scheduling, retries, monitoring, etc.), you probably want Airflow/Prefect/etc. StagecoachML is about one very specific modelling pattern: staged feature arrival.
Contents¶
Getting Started
Examples
API Reference