pyppur.utils package¶

Utility functions for pyppur.

pyppur.utils.compute_silhouette(X_embedded: ndarray, labels: ndarray) → float[source]¶

Compute the silhouette score for the embedding.

The silhouette score measures how well clusters are separated.

Parameters:

X_embedded – Low-dimensional embedding.
labels – Cluster or class labels.

Returns:

Silhouette score in range [-1, 1].

pyppur.utils.compute_trustworthiness(X_original: ndarray, X_embedded: ndarray, n_neighbors: int = 5) → float[source]¶

Compute the trustworthiness score for dimensionality reduction.

Trustworthiness measures how well local neighborhoods are preserved.

Parameters:

X_original – Original high-dimensional data.
X_embedded – Low-dimensional embedding.
n_neighbors – Number of neighbors to consider.

Returns:

Trustworthiness score in range [0, 1].

pyppur.utils.standardize_data(X: ndarray, center: bool = True, scale: bool = True, scaler: StandardScaler | None = None) → tuple[ndarray, StandardScaler][source]¶

Standardize data for projection pursuit.

Parameters:

X – Input data, shape (n_samples, n_features).
center – Whether to center the data.
scale – Whether to scale the data to unit variance.
scaler – Optional pre-fitted scaler for transform-only operation.

Returns:

Standardized data and the scaler.

Submodules¶

pyppur.utils.metrics module¶

Evaluation metrics for dimensionality reduction.

pyppur.utils.metrics.compute_distance_distortion(X_original: ndarray, X_embedded: ndarray) → float[source]

Compute the distance distortion between original and embedded spaces.

Distance distortion measures how well pairwise distances are preserved.

Parameters:

X_original – Original high-dimensional data.
X_embedded – Low-dimensional embedding.

Returns:

Mean squared distance distortion.

pyppur.utils.metrics.compute_silhouette(X_embedded: ndarray, labels: ndarray) → float[source]

Compute the silhouette score for the embedding.

The silhouette score measures how well clusters are separated.

Parameters:

X_embedded – Low-dimensional embedding.
labels – Cluster or class labels.

Returns:

Silhouette score in range [-1, 1].

pyppur.utils.metrics.compute_trustworthiness(X_original: ndarray, X_embedded: ndarray, n_neighbors: int = 5) → float[source]

Compute the trustworthiness score for dimensionality reduction.

Trustworthiness measures how well local neighborhoods are preserved.

Parameters:

X_original – Original high-dimensional data.
X_embedded – Low-dimensional embedding.
n_neighbors – Number of neighbors to consider.

Returns:

Trustworthiness score in range [0, 1].

pyppur.utils.metrics.evaluate_embedding(X_original: ndarray, X_embedded: ndarray, labels: ndarray | None = None, n_neighbors: int = 5) → dict[str, float][source]

Evaluate the quality of an embedding using multiple metrics.

Parameters:

X_original – Original high-dimensional data.
X_embedded – Low-dimensional embedding.
labels – Optional cluster or class labels.
n_neighbors – Number of neighbors for trustworthiness.

Returns:

Dictionary with evaluation metrics.

pyppur.utils.preprocessing module¶

Preprocessing utilities for projection pursuit.

pyppur.utils.preprocessing.standardize_data(X: ndarray, center: bool = True, scale: bool = True, scaler: StandardScaler | None = None) → tuple[ndarray, StandardScaler][source]

Standardize data for projection pursuit.

Parameters:

X – Input data, shape (n_samples, n_features).
center – Whether to center the data.
scale – Whether to scale the data to unit variance.
scaler – Optional pre-fitted scaler for transform-only operation.

Returns:

Standardized data and the scaler.

pyppur.utils.visualization module¶

Visualization utilities for projection pursuit results.

pyppur.utils.visualization.plot_comparison(embeddings: dict[str, ndarray], labels: ndarray | None = None, metrics: dict[str, dict[str, float]] | None = None, title: str | None = None, figsize: tuple[float, float] = (15, 5), cmap: str = 'tab10', alpha: float = 0.7, s: float = 30.0) → Figure[source]

Plot a comparison of multiple embeddings.

Parameters:

embeddings – Dictionary of embeddings {name: embedded_data}.
labels – Optional labels for coloring points.
metrics – Optional dictionary of metrics for each embedding.
title – Optional overall figure title.
figsize – Figure size (width, height) in inches.
cmap – Colormap name.
alpha – Transparency of points.
s – Point size.

Returns:

matplotlib Figure object.

pyppur.utils.visualization.plot_embedding(X_embedded: ndarray, labels: ndarray | None = None, title: str = 'Projection Pursuit Embedding', metrics: dict[str, float] | None = None, figsize: tuple[float, float] = (10, 8), cmap: str = 'tab10', alpha: float = 0.7, s: float = 30.0, ax: Axes | Axes3D | None = None) → tuple[Figure, Axes | Axes3D][source]

Plot the results of a projection pursuit embedding.

Parameters:

X_embedded – Embedded data, shape (n_samples, 2) or (n_samples, 3).
labels – Optional labels for coloring points.
title – Plot title.
metrics – Optional dictionary of metrics to include in title.
figsize – Figure size (width, height) in inches.
cmap – Colormap name.
alpha – Transparency of points.
s – Point size.
ax – Optional axes to plot on.

Returns:

Figure and Axes objects.

pyppur.utils.visualization.plot_reconstruction(X: ndarray, X_recon: ndarray, n_samples: int = 3) → Figure[source]

Plot reconstructed samples alongside original samples.

Parameters:

X – Original data.
X_recon – Reconstructed data.
n_samples – Number of samples to plot.

Returns:

matplotlib Figure.