API Reference¶

This page provides detailed documentation for all pyppur classes and functions.

Main Classes¶

ProjectionPursuit¶

class pyppur.ProjectionPursuit(n_components: int = 2, objective: Objective = Objective.DISTANCE_DISTORTION, alpha: float = 1.0, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None, optimizer: str = 'L-BFGS-B', n_init: int = 3, verbose: bool = False, center: bool = True, scale: bool = True, weight_by_distance: bool = False, tied_weights: bool = True, l2_reg: float = 0.0, use_nonlinearity_in_distance: bool = True)[source]¶

Bases: object

Implementation of Projection Pursuit for dimensionality reduction.

This class provides methods to find optimal projections by minimizing either reconstruction loss or distance distortion. It supports both initialization strategies and different optimizers.

__init__(n_components: int = 2, objective: Objective = Objective.DISTANCE_DISTORTION, alpha: float = 1.0, max_iter: int = 500, tol: float = 1e-06, random_state: int | None = None, optimizer: str = 'L-BFGS-B', n_init: int = 3, verbose: bool = False, center: bool = True, scale: bool = True, weight_by_distance: bool = False, tied_weights: bool = True, l2_reg: float = 0.0, use_nonlinearity_in_distance: bool = True) → None[source]¶

Initialize a ProjectionPursuit model.

Parameters:

n_components – Number of projection dimensions to use.
objective – Optimization objective enum value.
alpha – Steepness parameter for the ridge function g(z) = tanh(alpha * z).
max_iter – Maximum number of iterations for optimization.
tol – Tolerance for optimization convergence.
random_state – Random seed for reproducibility.
optimizer – Optimization method (‘L-BFGS-B’ recommended).
n_init – Number of random initializations to try.
verbose – Whether to print progress information.
center – Whether to center the data.
scale – Whether to scale the data.
weight_by_distance – Whether to weight distance distortion by inverse of original distances.
tied_weights – Whether to use tied weights (encoder=decoder) for reconstruction.
l2_reg – L2 regularization strength for decoder weights (when tied_weights=False).
use_nonlinearity_in_distance – Whether to apply ridge function before computing distances.

property best_loss_: float¶

Get the best loss value achieved.

Returns:: Best loss value.

compute_silhouette(X: ndarray, labels: ndarray) → float[source]¶

Compute the silhouette score for the dimensionality reduction.

Silhouette score measures how well clusters are separated. A score close to 1.0 indicates that clusters are well separated, while a score close to -1.0 indicates poor separation.

Parameters:

X – Input data, shape (n_samples, n_features).
labels – Cluster labels for each sample.

Returns:

Silhouette score between -1.0 and 1.0.

compute_trustworthiness(X: ndarray, n_neighbors: int = 5) → float[source]¶

Compute the trustworthiness score for the dimensionality reduction.

Trustworthiness measures how well the local structure is preserved. A score of 1.0 indicates perfect trustworthiness, while a score of 0.0 indicates that the local structure is not preserved at all.

Parameters:

X – Input data, shape (n_samples, n_features).
n_neighbors – Number of neighbors to consider for trustworthiness.

Returns:

Trustworthiness score between 0.0 and 1.0.

property decoder_weights_: ndarray | None¶

Get the decoder weights (for untied weights only).

Returns:: Decoder weights, shape (n_components, n_features), or None if using tied weights.

distance_distortion(X: ndarray) → float[source]¶

Compute the distance distortion for X.

Parameters:: X – Input data, shape (n_samples, n_features).
Returns:: Mean squared distance distortion.

evaluate(X: ndarray, labels: ndarray | None = None, n_neighbors: int = 5) → dict[str, float][source]¶

Evaluate the dimensionality reduction with multiple metrics.

Parameters:

X – Input data, shape (n_samples, n_features).
labels – Optional cluster labels for silhouette score.
n_neighbors – Number of neighbors for trustworthiness.

Returns:

Dictionary with evaluation metrics.

fit(X: ndarray) → ProjectionPursuit[source]¶

Fit the ProjectionPursuit model to the data.

Parameters:: X – Input data, shape (n_samples, n_features).
Returns:: The fitted model.

property fit_time_: float¶

Get the time taken to fit the model.

Returns:: Time in seconds.

fit_transform(X: ndarray) → ndarray[source]¶

Fit the model with X and apply dimensionality reduction on X.

Parameters:: X – Input data, shape (n_samples, n_features).
Returns:: Transformed data, shape (n_samples, n_components).

property loss_curve_: list[float]¶

Get the loss curve during optimization.

Returns:: Loss values during optimization.

property optimizer_info_: dict[str, Any]¶

Get additional information from the optimizer.

Returns:: Optimizer information.

reconstruct(X: ndarray) → ndarray[source]¶

Reconstruct X from the projected data.

Parameters:: X – Input data, shape (n_samples, n_features).
Returns:: Reconstructed data, shape (n_samples, n_features).

reconstruction_error(X: ndarray) → float[source]¶

Compute the reconstruction error for X.

Parameters:: X – Input data, shape (n_samples, n_features).
Returns:: Mean squared reconstruction error.

transform(X: ndarray) → ndarray[source]¶

Apply dimensionality reduction to X.

Parameters:: X – Input data, shape (n_samples, n_features).
Returns:: Transformed data, shape (n_samples, n_components).

property x_loadings_: ndarray¶

Get the projection directions (encoder).

Returns:: Projection directions, shape (n_components, n_features).

Objective Types¶

class pyppur.Objective(*values)[source]¶

Bases: str, Enum

Objective types for projection pursuit.

DISTANCE_DISTORTION = 'distance_distortion'¶

RECONSTRUCTION = 'reconstruction'¶

Objective Functions¶

Base Objective¶

class pyppur.objectives.BaseObjective(alpha: float = 1.0, **kwargs: Any)[source]¶

Bases: ABC

Abstract base class for projection pursuit objective functions.

abstractmethod __call__(a_flat: ndarray, X: ndarray, k: int, **kwargs: Any) → float[source]¶

Compute the objective function value.

Parameters:

a_flat – Flattened projection directions.
X – Input data.
k – Number of projections.
**kwargs – Additional arguments.

Returns:

Objective function value.

__init__(alpha: float = 1.0, **kwargs: Any) → None[source]¶

Initialize the objective function.

Parameters:

alpha – Steepness parameter for ridge functions.
**kwargs – Additional keyword arguments.

static g(z: ndarray, alpha: float = 1.0) → ndarray[source]¶

Apply the ridge function (non-linearity) to projected data.

Parameters:

z – Input data, shape (n_samples, n_components).
alpha – Steepness parameter for the ridge function.

Returns:

Transformed data with the same shape as z.

static grad_g(z: ndarray, alpha: float = 1.0) → ndarray[source]¶

Compute the gradient of the ridge function.

Parameters:

z – Input data, shape (n_samples, n_components).
alpha – Steepness parameter for the ridge function.

Returns:

Gradient values with the same shape as z.

Distance Objective¶

class pyppur.objectives.DistanceObjective(alpha: float = 1.0, weight_by_distance: bool = False, use_nonlinearity: bool = True, **kwargs: Any)[source]¶

Bases: BaseObjective

Distance distortion objective function for projection pursuit.

This objective minimizes the difference between pairwise distances in the original space and the projected space. Can optionally apply ridge function nonlinearity before distance computation.

__call__(a_flat: ndarray, X: ndarray, k: int, dist_X: ndarray | None = None, weight_matrix: ndarray | None = None, **kwargs: Any) → float[source]¶

Compute the distance distortion objective.

Parameters:

a_flat – Flattened projection directions.
X – Input data.
k – Number of projections.
dist_X – Pairwise distances in original space (optional).
weight_matrix – Optional weight matrix for distances.
**kwargs – Additional arguments.

Returns:

Distance distortion value (to be minimized).

__init__(alpha: float = 1.0, weight_by_distance: bool = False, use_nonlinearity: bool = True, **kwargs: Any) → None[source]¶

Initialize the distance distortion objective.

Parameters:

alpha – Steepness parameter for the ridge function.
weight_by_distance – Whether to weight distortion by inverse of original distances.
use_nonlinearity – Whether to apply ridge function before computing distances.
**kwargs – Additional keyword arguments.

Reconstruction Objective¶

class pyppur.objectives.ReconstructionObjective(alpha: float = 1.0, tied_weights: bool = True, l2_reg: float = 0.0, **kwargs: Any)[source]¶

Bases: BaseObjective

Reconstruction loss objective function for projection pursuit.

This objective minimizes the reconstruction error when projecting and reconstructing data. Supports both tied-weights (encoder=decoder) and free decoder configurations.

__call__(a_flat: ndarray, X: ndarray, k: int, **kwargs: Any) → float[source]¶

Compute the reconstruction objective.

Parameters:

a_flat – Flattened parameters (encoder A, and decoder B if untied).
X – Input data.
k – Number of projections.
**kwargs – Additional arguments.

Returns:

Reconstruction loss value (to be minimized).

__init__(alpha: float = 1.0, tied_weights: bool = True, l2_reg: float = 0.0, **kwargs: Any) → None[source]¶

Initialize the reconstruction objective.

Parameters:

alpha – Steepness parameter for the ridge function.
tied_weights – If True, use tied weights (B=A). If False, learn separate decoder B.
l2_reg – L2 regularization strength for decoder weights (when tied_weights=False).
**kwargs – Additional keyword arguments.

reconstruct(X: ndarray, a_matrix: ndarray, b_matrix: ndarray | None = None) → ndarray[source]¶

Reconstruct data from projections.

Parameters:

X – Input data.
a_matrix – Encoder projection matrix.
b_matrix – Decoder matrix (if None, uses tied weights with a_matrix).

Returns:

Reconstructed data.

Optimizers¶

SciPy Optimizer¶

class pyppur.optimizers.ScipyOptimizer(objective_func: Callable[[...], float], n_components: int, method: str = 'L-BFGS-B', max_iter: int = 1000, tol: float = 1e-06, random_state: int | None = None, verbose: bool = False, **kwargs: Any)[source]¶

Bases: BaseOptimizer

Optimizer using SciPy’s optimization methods.

This optimizer leverages SciPy’s optimization functionality, particularly the L-BFGS-B method which is well-suited for projection pursuit problems.

__init__(objective_func: Callable[[...], float], n_components: int, method: str = 'L-BFGS-B', max_iter: int = 1000, tol: float = 1e-06, random_state: int | None = None, verbose: bool = False, **kwargs: Any) → None[source]¶

Initialize the SciPy optimizer.

Parameters:

objective_func – Objective function to minimize.
n_components – Number of projection components.
method – SciPy optimization method (default: “L-BFGS-B”).
max_iter – Maximum number of iterations.
tol – Tolerance for convergence.
random_state – Random seed for reproducibility.
verbose – Whether to print progress information.
**kwargs – Additional keyword arguments for the optimizer.

optimize(X: ndarray, initial_guess: ndarray | None = None, **kwargs: Any) → tuple[ndarray, float, dict[str, Any]][source]¶

Optimize the projection directions using SciPy’s optimization methods.

Parameters:

X – Input data, shape (n_samples, n_features).
initial_guess – Optional initial guess for projection directions.
**kwargs – Additional arguments for the objective function.

Returns:

Optimized projection directions, shape (n_components, n_features)
Final objective value
Additional optimizer information

Return type:

Tuple containing

Grid Optimizer¶

class pyppur.optimizers.GridOptimizer(objective_func: Callable[[...], float], n_components: int, n_directions: int = 250, n_iterations: int = 10, max_iter: int = 1000, tol: float = 1e-06, random_state: int | None = None, verbose: bool = False, **kwargs: Any)[source]¶

Bases: BaseOptimizer

Optimizer using a grid-based search approach.

This optimizer is particularly useful for projection indices that are not differentiable or have many local minima. It systematically explores the space of projection directions using a grid-based approach.

__init__(objective_func: Callable[[...], float], n_components: int, n_directions: int = 250, n_iterations: int = 10, max_iter: int = 1000, tol: float = 1e-06, random_state: int | None = None, verbose: bool = False, **kwargs: Any) → None[source]¶

Initialize the grid optimizer.

Parameters:

objective_func – Objective function to minimize.
n_components – Number of projection components.
n_directions – Number of random directions to generate per iteration.
n_iterations – Number of refinement iterations.
max_iter – Maximum number of iterations.
tol – Tolerance for convergence.
random_state – Random seed for reproducibility.
verbose – Whether to print progress information.
**kwargs – Additional keyword arguments for the optimizer.

optimize(X: ndarray, initial_guess: ndarray | None = None, **kwargs: Any) → tuple[ndarray, float, dict[str, Any]][source]¶

Optimize the projection directions using a grid-based approach.

Parameters:

X – Input data, shape (n_samples, n_features).
initial_guess – Optional initial guess for projection directions.
**kwargs – Additional arguments for the objective function.

Returns:

Optimized projection directions, shape (n_components, n_features)
Final objective value
Additional optimizer information

Return type:

Tuple containing

Utility Functions¶

Metrics¶

Evaluation metrics for dimensionality reduction.

pyppur.utils.metrics.compute_distance_distortion(X_original: ndarray, X_embedded: ndarray) → float[source]¶

Compute the distance distortion between original and embedded spaces.

Distance distortion measures how well pairwise distances are preserved.

Parameters:

X_original – Original high-dimensional data.
X_embedded – Low-dimensional embedding.

Returns:

Mean squared distance distortion.

pyppur.utils.metrics.compute_silhouette(X_embedded: ndarray, labels: ndarray) → float[source]¶

Compute the silhouette score for the embedding.

The silhouette score measures how well clusters are separated.

Parameters:

X_embedded – Low-dimensional embedding.
labels – Cluster or class labels.

Returns:

Silhouette score in range [-1, 1].

pyppur.utils.metrics.compute_trustworthiness(X_original: ndarray, X_embedded: ndarray, n_neighbors: int = 5) → float[source]¶

Compute the trustworthiness score for dimensionality reduction.

Trustworthiness measures how well local neighborhoods are preserved.

Parameters:

X_original – Original high-dimensional data.
X_embedded – Low-dimensional embedding.
n_neighbors – Number of neighbors to consider.

Returns:

Trustworthiness score in range [0, 1].

pyppur.utils.metrics.evaluate_embedding(X_original: ndarray, X_embedded: ndarray, labels: ndarray | None = None, n_neighbors: int = 5) → dict[str, float][source]¶

Evaluate the quality of an embedding using multiple metrics.

Parameters:

X_original – Original high-dimensional data.
X_embedded – Low-dimensional embedding.
labels – Optional cluster or class labels.
n_neighbors – Number of neighbors for trustworthiness.

Returns:

Dictionary with evaluation metrics.

Preprocessing¶

Preprocessing utilities for projection pursuit.

pyppur.utils.preprocessing.standardize_data(X: ndarray, center: bool = True, scale: bool = True, scaler: StandardScaler | None = None) → tuple[ndarray, StandardScaler][source]¶

Standardize data for projection pursuit.

Parameters:

X – Input data, shape (n_samples, n_features).
center – Whether to center the data.
scale – Whether to scale the data to unit variance.
scaler – Optional pre-fitted scaler for transform-only operation.

Returns:

Standardized data and the scaler.

Visualization¶

Visualization utilities for projection pursuit results.

pyppur.utils.visualization.plot_comparison(embeddings: dict[str, ndarray], labels: ndarray | None = None, metrics: dict[str, dict[str, float]] | None = None, title: str | None = None, figsize: tuple[float, float] = (15, 5), cmap: str = 'tab10', alpha: float = 0.7, s: float = 30.0) → Figure[source]¶

Plot a comparison of multiple embeddings.

Parameters:

embeddings – Dictionary of embeddings {name: embedded_data}.
labels – Optional labels for coloring points.
metrics – Optional dictionary of metrics for each embedding.
title – Optional overall figure title.
figsize – Figure size (width, height) in inches.
cmap – Colormap name.
alpha – Transparency of points.
s – Point size.

Returns:

matplotlib Figure object.

pyppur.utils.visualization.plot_embedding(X_embedded: ndarray, labels: ndarray | None = None, title: str = 'Projection Pursuit Embedding', metrics: dict[str, float] | None = None, figsize: tuple[float, float] = (10, 8), cmap: str = 'tab10', alpha: float = 0.7, s: float = 30.0, ax: Axes | Axes3D | None = None) → tuple[Figure, Axes | Axes3D][source]¶

Plot the results of a projection pursuit embedding.

Parameters:

X_embedded – Embedded data, shape (n_samples, 2) or (n_samples, 3).
labels – Optional labels for coloring points.
title – Plot title.
metrics – Optional dictionary of metrics to include in title.
figsize – Figure size (width, height) in inches.
cmap – Colormap name.
alpha – Transparency of points.
s – Point size.
ax – Optional axes to plot on.

Returns:

Figure and Axes objects.

pyppur.utils.visualization.plot_reconstruction(X: ndarray, X_recon: ndarray, n_samples: int = 3) → Figure[source]¶

Plot reconstructed samples alongside original samples.

Parameters:

X – Original data.
X_recon – Reconstructed data.
n_samples – Number of samples to plot.

Returns:

matplotlib Figure.

Helper Functions¶

Normalization¶

pyppur.optimizers.scipy_optimizer.normalize_projection_directions(a_flat: ndarray, n_components: int, n_features: int) → ndarray[source]¶

Normalize the encoder projection directions to unit norm.

Parameters:

a_flat – Flattened parameter vector.
n_components – Number of projection components.
n_features – Number of features.

Returns:

Normalized parameter vector.