alsgls package¶

class alsgls.ALSGLS(*, rank: int | str | None = 'auto', lam_F: float = 0.001, lam_B: float = 0.001, max_sweeps: int = 12, rel_tol: float = 1e-06, d_floor: float = 1e-08, cg_maxit: int = 800, cg_tol: float = 3e-07, scale_correct: bool = True, scale_floor: float = 1e-08)[source]¶

Bases: object

Scikit-learn style estimator for low-rank GLS via ALS.

fit(Xs: Sequence[Any], Y: Any) → ALSGLS[source]¶

get_params(deep: bool = True) → dict[str, Any][source]¶

predict(Xs: Sequence[Any]) → ndarray[source]¶

score(Xs: Sequence[Any], Y: Any) → float[source]¶

set_params(**params: Any) → ALSGLS[source]¶

class alsgls.ALSGLSSystem(system: Mapping[Any, tuple[Any, Any]] | Sequence[tuple[Any, tuple[Any, Any]]], *, rank: int | str | None = 'auto', lam_F: float = 0.001, lam_B: float = 0.001, max_sweeps: int = 12, rel_tol: float = 1e-06, d_floor: float = 1e-08, cg_maxit: int = 800, cg_tol: float = 3e-07, scale_correct: bool = True, scale_floor: float = 1e-08)[source]¶

Bases: object

Statsmodels-style system container for ALS GLS fitting.

as_arrays() → tuple[list[ndarray], ndarray][source]¶

fit() → ALSGLSSystemResults[source]¶

property keqs: int¶

property nobs: int¶

class alsgls.ALSGLSSystemResults(model: ALSGLSSystem, estimator: ALSGLS)[source]¶

Bases: object

Lightweight results container mimicking statsmodels outputs.

params_as_series()[source]¶

predict(exog: Mapping[Any, Any] | Sequence[Any] | None = None) → ndarray[source]¶

summary_dict() → dict[str, Any][source]¶

alsgls.XB_from_Blist(Xs, B_list)[source]¶: Return N × K matrix of predictions.

alsgls.als_gls(Xs, Y, k, lam_F: float = 0.001, lam_B: float = 0.001, sweeps: int = 8, d_floor: float = 1e-08, cg_maxit: int = 800, cg_tol: float = 3e-07, *, scale_correct: bool = True, scale_floor: float = 1e-08, rel_tol: float = 1e-06)[source]¶

Alternating-least-squares GLS with low-rank-plus-diagonal covariance. Uses Woodbury throughout; never materializes K×K dense Σ.

Enhancements (correctness-first):

Cached Woodbury pieces per sweep.
Block-Jacobi preconditioner using diag(Σ^{-1}).
PCA init of F with F F^T ≈ R^T R / N.
Guarded MLE scale-correction of Σ each sweep.
β-step REVERT if it worsens NLL (keeps trace non-increasing).
Backtracking/damped acceptance on (F, D) to accept only NLL-improving updates.
Dual traces in info: nll_beta_trace (post-β), nll_trace/nll_sigma_trace (post-Σ).

Parameters:

Xs (list of (N×p_j) arrays (length K))
Y ((N×K) array)
k (int rank of low-rank component (F: K×k))
lam_F (float ridge for U/F ALS updates)
lam_B (float ridge for β-step (per-equation))
sweeps (int max ALS sweeps)
d_floor (float min variance for D to ensure SPD)
cg_maxit (int max CG iterations in β-step)
cg_tol (float CG relative tolerance)
scale_correct (bool if True, try guarded MLE scale fix on Σ each sweep)
scale_floor (float min scalar for scale correction)
rel_tol (float relative NLL improvement threshold for early stopping)

Returns:

info includes:

p_list
cg (last sweep)
nll_trace (post-Σ, non-increasing)
nll_sigma_trace (alias of nll_trace)
nll_beta_trace (post-β baseline per sweep)
accept_t (list of accepted backtracking t)
scale_used (list of accepted scale factors, 1.0 if not applied)

Return type:

B_list, F, D, mem_MB_est, info

alsgls.em_gls(Xs, Y, k, lam_F=0.001, lam_B=0.001, iters=30, d_floor=1e-08)[source]¶: Dense-ish EM baseline for low-rank+diag GLS. Builds Σ^{-1} explicitly (KxK) in the β-step to mimic O(K^2) memory. Returns (B_list, F, D, mem_MB_est, info)

alsgls.mse(Y, Yhat)[source]¶: Mean squared error between two matrices.

alsgls.nll_per_row(R, F, D)[source]¶

Negative log-likelihood per row for residual matrix R under Σ = F F^T + diag(D) with Gaussian errors.

Returns:: 0.5 * [ tr(R Σ^{-1} R^T)/N + log det(Σ) + K log(2π) ] where N is the number of rows in R.
Return type:: float

alsgls.simulate_gls(N_tr, N_te, p_list, k, seed=0)[source]¶

Simulate a generalized least squares (GLS) dataset.

This variant allows each response equation to have its own number of features as specified by p_list.

Parameters:

N_tr (int) – Number of training samples.
N_te (int) – Number of test samples.
p_list (sequence of int) – Number of features for each equation.
k (int) – Latent factor dimension controlling correlated noise.
seed (int, optional) – Seed for the NumPy random number generator. Defaults to 0.

Returns:

X_tr (list of ndarray) – Feature matrices for the training set. X_tr[j] has shape (N_tr, p_list[j]).
Y_tr (ndarray) – Training responses of shape (N_tr, K) where K = len(p_list).
X_te (list of ndarray) – Feature matrices for the test set. X_te[j] has shape (N_te, p_list[j]).
Y_te (ndarray) – Test responses of shape (N_te, K).

Notes

Randomness is controlled via numpy.random.default_rng(seed); pass a different seed for different simulations.

Examples

>>> p_list = [3, 5, 2]
>>> Xtr, Ytr, Xte, Yte = simulate_gls(100, 20, p_list, k=2, seed=0)

alsgls.simulate_sur(N_tr, N_te, K, p, k, seed=0)[source]¶

Simulate a Seemingly Unrelated Regression (SUR) dataset.

Parameters:

N_tr (int) – Number of training samples.
N_te (int) – Number of test samples.
K (int) – Number of response equations.
p (int) – Number of features per equation.
k (int) – Latent factor dimension controlling correlated noise.
seed (int, optional) – Seed for the NumPy random number generator. Defaults to 0.

Returns:

X_tr (list of ndarray) – Feature matrices for the training set. Each element has shape (N_tr, p).
Y_tr (ndarray) – Training responses of shape (N_tr, K).
X_te (list of ndarray) – Feature matrices for the test set. Each element has shape (N_te, p).
Y_te (ndarray) – Test responses of shape (N_te, K).

Notes

Randomness is controlled via numpy.random.default_rng(seed); pass a different seed for different simulations.

Examples

>>> Xtr, Ytr, Xte, Yte = simulate_sur(100, 20, K=3, p=5, k=2, seed=42)

Submodules¶

alsgls.als module¶

alsgls.als.als_gls(Xs, Y, k, lam_F: float = 0.001, lam_B: float = 0.001, sweeps: int = 8, d_floor: float = 1e-08, cg_maxit: int = 800, cg_tol: float = 3e-07, *, scale_correct: bool = True, scale_floor: float = 1e-08, rel_tol: float = 1e-06)[source]¶

Alternating-least-squares GLS with low-rank-plus-diagonal covariance. Uses Woodbury throughout; never materializes K×K dense Σ.

Enhancements (correctness-first):

Cached Woodbury pieces per sweep.
Block-Jacobi preconditioner using diag(Σ^{-1}).
PCA init of F with F F^T ≈ R^T R / N.
Guarded MLE scale-correction of Σ each sweep.
β-step REVERT if it worsens NLL (keeps trace non-increasing).
Backtracking/damped acceptance on (F, D) to accept only NLL-improving updates.
Dual traces in info: nll_beta_trace (post-β), nll_trace/nll_sigma_trace (post-Σ).

Parameters:

Xs (list of (N×p_j) arrays (length K))
Y ((N×K) array)
k (int rank of low-rank component (F: K×k))
lam_F (float ridge for U/F ALS updates)
lam_B (float ridge for β-step (per-equation))
sweeps (int max ALS sweeps)
d_floor (float min variance for D to ensure SPD)
cg_maxit (int max CG iterations in β-step)
cg_tol (float CG relative tolerance)
scale_correct (bool if True, try guarded MLE scale fix on Σ each sweep)
scale_floor (float min scalar for scale correction)
rel_tol (float relative NLL improvement threshold for early stopping)

Returns:

info includes:

p_list
cg (last sweep)
nll_trace (post-Σ, non-increasing)
nll_sigma_trace (alias of nll_trace)
nll_beta_trace (post-β baseline per sweep)
accept_t (list of accepted backtracking t)
scale_used (list of accepted scale factors, 1.0 if not applied)

Return type:

B_list, F, D, mem_MB_est, info

alsgls.em module¶

alsgls.em.em_gls(Xs, Y, k, lam_F=0.001, lam_B=0.001, iters=30, d_floor=1e-08)[source]¶: Dense-ish EM baseline for low-rank+diag GLS. Builds Σ^{-1} explicitly (KxK) in the β-step to mimic O(K^2) memory. Returns (B_list, F, D, mem_MB_est, info)

alsgls.ops module¶

alsgls.ops.XB_from_Blist(Xs, B_list)[source]¶: Return N × K matrix of predictions.

alsgls.ops.apply_siginv_to_matrix(M: ndarray, F: ndarray, D: ndarray, *, Dinv: ndarray | None = None, C_inv: ndarray | None = None, C_chol: ndarray | None = None) → ndarray[source]¶

Right-multiply an (N×K) matrix M by Σ^{-1} using Woodbury, where Σ = F F^T + diag(D).

You may pass either C_inv (explicit inverse of I + F^T D^{-1} F) or C_chol (its Cholesky factor). If neither is provided, a small explicit inverse will be computed internally for convenience.

alsgls.ops.cg_solve(operator_mv, b, x0=None, maxit=500, tol=1e-07, M_pre=None)[source]¶

Conjugate gradient for SPD operator A (matrix-free).

Parameters:

operator_mv (callable) – Function that returns A @ x for a given x.
b (ndarray) – Right-hand side.
x0 (ndarray, optional) – Initial guess.
maxit (int) – Maximum CG iterations.
tol (float) – Relative residual tolerance.
M_pre (callable, optional) – Preconditioner application: returns M^{-1} @ r.

Returns:

x (ndarray) – Approximate solution.
info (dict) – Iterations and final residual norm.

alsgls.ops.siginv_diag(F: ndarray, Dinv: ndarray, C_chol: ndarray) → ndarray[source]¶

Compute the diagonal of Σ^{-1} = D^{-1} - D^{-1} F C^{-1} F^T D^{-1} efficiently given Dinv and the Cholesky factor of C.

Returns:: diag_Sinv – The diagonal entries of Σ^{-1}.
Return type:: (K,) array

alsgls.ops.stack_B_list(B_list)[source]¶: Stack list of (p_j×1) blocks into a flat vector.

alsgls.ops.unstack_B_vec(bvec, p_list)[source]¶: Inverse of stack: vector -> list of (p_j×1).

alsgls.ops.woodbury_chol(F: ndarray, D: ndarray)[source]¶

Return (Dinv, C_chol) where C_chol is the Cholesky factor of C = I + F^T D^{-1} F.

Intended for numerically stable downstream solves that avoid forming C^{-1} explicitly.

alsgls.ops.woodbury_pieces(F: ndarray, D: ndarray)[source]¶

Return (Dinv, C_inv) used in Woodbury for Σ = F F^T + diag(D).

Σ^{-1} = D^{-1} - D^{-1} F C^{-1} F^T D^{-1}, where C = (I + F^T D^{-1} F). For backward compatibility, this returns the explicit small inverse C_inv.

alsgls.api module¶

High-level estimator APIs for ALS-based GLS fitting.

class alsgls.api.ALSGLS(*, rank: int | str | None = 'auto', lam_F: float = 0.001, lam_B: float = 0.001, max_sweeps: int = 12, rel_tol: float = 1e-06, d_floor: float = 1e-08, cg_maxit: int = 800, cg_tol: float = 3e-07, scale_correct: bool = True, scale_floor: float = 1e-08)[source]¶

Bases: object

Scikit-learn style estimator for low-rank GLS via ALS.

fit(Xs: Sequence[Any], Y: Any) → ALSGLS[source]¶

get_params(deep: bool = True) → dict[str, Any][source]¶

predict(Xs: Sequence[Any]) → ndarray[source]¶

score(Xs: Sequence[Any], Y: Any) → float[source]¶

set_params(**params: Any) → ALSGLS[source]¶

class alsgls.api.ALSGLSSystem(system: Mapping[Any, tuple[Any, Any]] | Sequence[tuple[Any, tuple[Any, Any]]], *, rank: int | str | None = 'auto', lam_F: float = 0.001, lam_B: float = 0.001, max_sweeps: int = 12, rel_tol: float = 1e-06, d_floor: float = 1e-08, cg_maxit: int = 800, cg_tol: float = 3e-07, scale_correct: bool = True, scale_floor: float = 1e-08)[source]¶

Bases: object

Statsmodels-style system container for ALS GLS fitting.

as_arrays() → tuple[list[ndarray], ndarray][source]¶

fit() → ALSGLSSystemResults[source]¶

property keqs: int¶

property nobs: int¶

class alsgls.api.ALSGLSSystemResults(model: ALSGLSSystem, estimator: ALSGLS)[source]¶

Bases: object

Lightweight results container mimicking statsmodels outputs.

params_as_series()[source]¶

predict(exog: Mapping[Any, Any] | Sequence[Any] | None = None) → ndarray[source]¶

summary_dict() → dict[str, Any][source]¶

alsgls.sim module¶

alsgls.sim.simulate_gls(N_tr, N_te, p_list, k, seed=0)[source]¶

Simulate a generalized least squares (GLS) dataset.

This variant allows each response equation to have its own number of features as specified by p_list.

Parameters:

N_tr (int) – Number of training samples.
N_te (int) – Number of test samples.
p_list (sequence of int) – Number of features for each equation.
k (int) – Latent factor dimension controlling correlated noise.
seed (int, optional) – Seed for the NumPy random number generator. Defaults to 0.

Returns:

X_tr (list of ndarray) – Feature matrices for the training set. X_tr[j] has shape (N_tr, p_list[j]).
Y_tr (ndarray) – Training responses of shape (N_tr, K) where K = len(p_list).
X_te (list of ndarray) – Feature matrices for the test set. X_te[j] has shape (N_te, p_list[j]).
Y_te (ndarray) – Test responses of shape (N_te, K).

Notes

Randomness is controlled via numpy.random.default_rng(seed); pass a different seed for different simulations.

Examples

>>> p_list = [3, 5, 2]
>>> Xtr, Ytr, Xte, Yte = simulate_gls(100, 20, p_list, k=2, seed=0)

alsgls.sim.simulate_sur(N_tr, N_te, K, p, k, seed=0)[source]¶

Simulate a Seemingly Unrelated Regression (SUR) dataset.

Parameters:

N_tr (int) – Number of training samples.
N_te (int) – Number of test samples.
K (int) – Number of response equations.
p (int) – Number of features per equation.
k (int) – Latent factor dimension controlling correlated noise.
seed (int, optional) – Seed for the NumPy random number generator. Defaults to 0.

Returns:

X_tr (list of ndarray) – Feature matrices for the training set. Each element has shape (N_tr, p).
Y_tr (ndarray) – Training responses of shape (N_tr, K).
X_te (list of ndarray) – Feature matrices for the test set. Each element has shape (N_te, p).
Y_te (ndarray) – Test responses of shape (N_te, K).

Notes

Randomness is controlled via numpy.random.default_rng(seed); pass a different seed for different simulations.

Examples

>>> Xtr, Ytr, Xte, Yte = simulate_sur(100, 20, K=3, p=5, k=2, seed=42)

alsgls.metrics module¶

alsgls.metrics.mse(Y, Yhat)[source]¶: Mean squared error between two matrices.

alsgls.metrics.nll_per_row(R, F, D)[source]¶

Negative log-likelihood per row for residual matrix R under Σ = F F^T + diag(D) with Gaussian errors.

Returns:: 0.5 * [ tr(R Σ^{-1} R^T)/N + log det(Σ) + K log(2π) ] where N is the number of rows in R.
Return type:: float