pyppur.utils package¶
Utility functions for pyppur.
- pyppur.utils.compute_silhouette(X_embedded: ndarray, labels: ndarray) float[source]¶
Compute the silhouette score for the embedding.
The silhouette score measures how well clusters are separated.
- Parameters:
X_embedded – Low-dimensional embedding.
labels – Cluster or class labels.
- Returns:
Silhouette score in range [-1, 1].
- pyppur.utils.compute_trustworthiness(X_original: ndarray, X_embedded: ndarray, n_neighbors: int = 5) float[source]¶
Compute the trustworthiness score for dimensionality reduction.
Trustworthiness measures how well local neighborhoods are preserved.
- Parameters:
X_original – Original high-dimensional data.
X_embedded – Low-dimensional embedding.
n_neighbors – Number of neighbors to consider.
- Returns:
Trustworthiness score in range [0, 1].
- pyppur.utils.standardize_data(X: ndarray, center: bool = True, scale: bool = True, scaler: StandardScaler | None = None) tuple[ndarray, StandardScaler][source]¶
Standardize data for projection pursuit.
- Parameters:
X – Input data, shape (n_samples, n_features).
center – Whether to center the data.
scale – Whether to scale the data to unit variance.
scaler – Optional pre-fitted scaler for transform-only operation.
- Returns:
Standardized data and the scaler.
Submodules¶
pyppur.utils.metrics module¶
Evaluation metrics for dimensionality reduction.
- pyppur.utils.metrics.compute_distance_distortion(X_original: ndarray, X_embedded: ndarray) float[source]
Compute the distance distortion between original and embedded spaces.
Distance distortion measures how well pairwise distances are preserved.
- Parameters:
X_original – Original high-dimensional data.
X_embedded – Low-dimensional embedding.
- Returns:
Mean squared distance distortion.
- pyppur.utils.metrics.compute_silhouette(X_embedded: ndarray, labels: ndarray) float[source]
Compute the silhouette score for the embedding.
The silhouette score measures how well clusters are separated.
- Parameters:
X_embedded – Low-dimensional embedding.
labels – Cluster or class labels.
- Returns:
Silhouette score in range [-1, 1].
- pyppur.utils.metrics.compute_trustworthiness(X_original: ndarray, X_embedded: ndarray, n_neighbors: int = 5) float[source]
Compute the trustworthiness score for dimensionality reduction.
Trustworthiness measures how well local neighborhoods are preserved.
- Parameters:
X_original – Original high-dimensional data.
X_embedded – Low-dimensional embedding.
n_neighbors – Number of neighbors to consider.
- Returns:
Trustworthiness score in range [0, 1].
- pyppur.utils.metrics.evaluate_embedding(X_original: ndarray, X_embedded: ndarray, labels: ndarray | None = None, n_neighbors: int = 5) dict[str, float][source]
Evaluate the quality of an embedding using multiple metrics.
- Parameters:
X_original – Original high-dimensional data.
X_embedded – Low-dimensional embedding.
labels – Optional cluster or class labels.
n_neighbors – Number of neighbors for trustworthiness.
- Returns:
Dictionary with evaluation metrics.
pyppur.utils.preprocessing module¶
Preprocessing utilities for projection pursuit.
- pyppur.utils.preprocessing.standardize_data(X: ndarray, center: bool = True, scale: bool = True, scaler: StandardScaler | None = None) tuple[ndarray, StandardScaler][source]
Standardize data for projection pursuit.
- Parameters:
X – Input data, shape (n_samples, n_features).
center – Whether to center the data.
scale – Whether to scale the data to unit variance.
scaler – Optional pre-fitted scaler for transform-only operation.
- Returns:
Standardized data and the scaler.
pyppur.utils.visualization module¶
Visualization utilities for projection pursuit results.
- pyppur.utils.visualization.plot_comparison(embeddings: dict[str, ndarray], labels: ndarray | None = None, metrics: dict[str, dict[str, float]] | None = None, title: str | None = None, figsize: tuple[float, float] = (15, 5), cmap: str = 'tab10', alpha: float = 0.7, s: float = 30.0) Figure[source]
Plot a comparison of multiple embeddings.
- Parameters:
embeddings – Dictionary of embeddings {name: embedded_data}.
labels – Optional labels for coloring points.
metrics – Optional dictionary of metrics for each embedding.
title – Optional overall figure title.
figsize – Figure size (width, height) in inches.
cmap – Colormap name.
alpha – Transparency of points.
s – Point size.
- Returns:
matplotlib Figure object.
- pyppur.utils.visualization.plot_embedding(X_embedded: ndarray, labels: ndarray | None = None, title: str = 'Projection Pursuit Embedding', metrics: dict[str, float] | None = None, figsize: tuple[float, float] = (10, 8), cmap: str = 'tab10', alpha: float = 0.7, s: float = 30.0, ax: Axes | Axes3D | None = None) tuple[Figure, Axes | Axes3D][source]
Plot the results of a projection pursuit embedding.
- Parameters:
X_embedded – Embedded data, shape (n_samples, 2) or (n_samples, 3).
labels – Optional labels for coloring points.
title – Plot title.
metrics – Optional dictionary of metrics to include in title.
figsize – Figure size (width, height) in inches.
cmap – Colormap name.
alpha – Transparency of points.
s – Point size.
ax – Optional axes to plot on.
- Returns:
Figure and Axes objects.