pyppur.utils package

Utility functions for pyppur.

pyppur.utils.compute_silhouette(X_embedded: ndarray, labels: ndarray) float[source]

Compute the silhouette score for the embedding.

The silhouette score measures how well clusters are separated.

Parameters:
  • X_embedded – Low-dimensional embedding.

  • labels – Cluster or class labels.

Returns:

Silhouette score in range [-1, 1].

pyppur.utils.compute_trustworthiness(X_original: ndarray, X_embedded: ndarray, n_neighbors: int = 5) float[source]

Compute the trustworthiness score for dimensionality reduction.

Trustworthiness measures how well local neighborhoods are preserved.

Parameters:
  • X_original – Original high-dimensional data.

  • X_embedded – Low-dimensional embedding.

  • n_neighbors – Number of neighbors to consider.

Returns:

Trustworthiness score in range [0, 1].

pyppur.utils.standardize_data(X: ndarray, center: bool = True, scale: bool = True, scaler: StandardScaler | None = None) tuple[ndarray, StandardScaler][source]

Standardize data for projection pursuit.

Parameters:
  • X – Input data, shape (n_samples, n_features).

  • center – Whether to center the data.

  • scale – Whether to scale the data to unit variance.

  • scaler – Optional pre-fitted scaler for transform-only operation.

Returns:

Standardized data and the scaler.

Submodules

pyppur.utils.metrics module

Evaluation metrics for dimensionality reduction.

pyppur.utils.metrics.compute_distance_distortion(X_original: ndarray, X_embedded: ndarray) float[source]

Compute the distance distortion between original and embedded spaces.

Distance distortion measures how well pairwise distances are preserved.

Parameters:
  • X_original – Original high-dimensional data.

  • X_embedded – Low-dimensional embedding.

Returns:

Mean squared distance distortion.

pyppur.utils.metrics.compute_silhouette(X_embedded: ndarray, labels: ndarray) float[source]

Compute the silhouette score for the embedding.

The silhouette score measures how well clusters are separated.

Parameters:
  • X_embedded – Low-dimensional embedding.

  • labels – Cluster or class labels.

Returns:

Silhouette score in range [-1, 1].

pyppur.utils.metrics.compute_trustworthiness(X_original: ndarray, X_embedded: ndarray, n_neighbors: int = 5) float[source]

Compute the trustworthiness score for dimensionality reduction.

Trustworthiness measures how well local neighborhoods are preserved.

Parameters:
  • X_original – Original high-dimensional data.

  • X_embedded – Low-dimensional embedding.

  • n_neighbors – Number of neighbors to consider.

Returns:

Trustworthiness score in range [0, 1].

pyppur.utils.metrics.evaluate_embedding(X_original: ndarray, X_embedded: ndarray, labels: ndarray | None = None, n_neighbors: int = 5) dict[str, float][source]

Evaluate the quality of an embedding using multiple metrics.

Parameters:
  • X_original – Original high-dimensional data.

  • X_embedded – Low-dimensional embedding.

  • labels – Optional cluster or class labels.

  • n_neighbors – Number of neighbors for trustworthiness.

Returns:

Dictionary with evaluation metrics.

pyppur.utils.preprocessing module

Preprocessing utilities for projection pursuit.

pyppur.utils.preprocessing.standardize_data(X: ndarray, center: bool = True, scale: bool = True, scaler: StandardScaler | None = None) tuple[ndarray, StandardScaler][source]

Standardize data for projection pursuit.

Parameters:
  • X – Input data, shape (n_samples, n_features).

  • center – Whether to center the data.

  • scale – Whether to scale the data to unit variance.

  • scaler – Optional pre-fitted scaler for transform-only operation.

Returns:

Standardized data and the scaler.

pyppur.utils.visualization module

Visualization utilities for projection pursuit results.

pyppur.utils.visualization.plot_comparison(embeddings: dict[str, ndarray], labels: ndarray | None = None, metrics: dict[str, dict[str, float]] | None = None, title: str | None = None, figsize: tuple[float, float] = (15, 5), cmap: str = 'tab10', alpha: float = 0.7, s: float = 30.0) Figure[source]

Plot a comparison of multiple embeddings.

Parameters:
  • embeddings – Dictionary of embeddings {name: embedded_data}.

  • labels – Optional labels for coloring points.

  • metrics – Optional dictionary of metrics for each embedding.

  • title – Optional overall figure title.

  • figsize – Figure size (width, height) in inches.

  • cmap – Colormap name.

  • alpha – Transparency of points.

  • s – Point size.

Returns:

matplotlib Figure object.

pyppur.utils.visualization.plot_embedding(X_embedded: ndarray, labels: ndarray | None = None, title: str = 'Projection Pursuit Embedding', metrics: dict[str, float] | None = None, figsize: tuple[float, float] = (10, 8), cmap: str = 'tab10', alpha: float = 0.7, s: float = 30.0, ax: Axes | Axes3D | None = None) tuple[Figure, Axes | Axes3D][source]

Plot the results of a projection pursuit embedding.

Parameters:
  • X_embedded – Embedded data, shape (n_samples, 2) or (n_samples, 3).

  • labels – Optional labels for coloring points.

  • title – Plot title.

  • metrics – Optional dictionary of metrics to include in title.

  • figsize – Figure size (width, height) in inches.

  • cmap – Colormap name.

  • alpha – Transparency of points.

  • s – Point size.

  • ax – Optional axes to plot on.

Returns:

Figure and Axes objects.

pyppur.utils.visualization.plot_reconstruction(X: ndarray, X_recon: ndarray, n_samples: int = 3) Figure[source]

Plot reconstructed samples alongside original samples.

Parameters:
  • X – Original data.

  • X_recon – Reconstructed data.

  • n_samples – Number of samples to plot.

Returns:

matplotlib Figure.