Getting Started¶
Installation¶
pyppur can be installed from PyPI using pip:
pip install pyppur
For development, you can install from source with development dependencies:
git clone https://github.com/gojiplus/pyppur.git
cd pyppur
pip install -e .[dev]
Requirements¶
Python 3.10+
NumPy (>=1.20.0)
SciPy (>=1.7.0)
scikit-learn (>=1.0.0)
matplotlib (>=3.3.0)
Quick Example¶
Here’s a simple example using the digits dataset:
import numpy as np
from pyppur import ProjectionPursuit, Objective
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
# Load and standardize data
digits = load_digits()
X, y = digits.data, digits.target
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Create and fit the model
pp = ProjectionPursuit(
n_components=2,
objective=Objective.DISTANCE_DISTORTION,
alpha=1.5,
n_init=3,
verbose=True
)
# Transform the data
X_transformed = pp.fit_transform(X_scaled)
print(f"Original shape: {X.shape}")
print(f"Transformed shape: {X_transformed.shape}")
# Evaluate the embedding
metrics = pp.evaluate(X_scaled, y)
for metric, value in metrics.items():
print(f"{metric}: {value:.4f}")
Key Concepts¶
Objectives¶
pyppur supports two main objectives:
Distance Distortion (
Objective.DISTANCE_DISTORTION): Minimizes the difference between pairwise distances in the original and projected spaces.With nonlinearity (default): Compares distances after applying tanh transformation
Without nonlinearity: Compares linear projection distances
Reconstruction (
Objective.RECONSTRUCTION): Minimizes reconstruction error using ridge functions.Tied weights (default): Encoder and decoder share the same weights
Untied weights: Separate encoder and decoder with optional regularization
Ridge Functions¶
pyppur uses tanh as the ridge function: \(g(z) = \tanh(\alpha \cdot z)\)
The steepness parameter \(\alpha\) controls the nonlinearity:
Small \(\alpha\): Nearly linear behavior
Large \(\alpha\): Strong nonlinearity with saturation
Configuration Options¶
Key parameters you can adjust:
n_components: Number of output dimensionsalpha: Ridge function steepnesstied_weights: Whether to use tied encoder/decoder weights (reconstruction only)use_nonlinearity_in_distance: Whether to apply nonlinearity in distance objectivel2_reg: L2 regularization for decoder weights (untied weights only)n_init: Number of random initializationsmax_iter: Maximum optimization iterations
Comparison with Other Methods¶
pyppur vs PCA¶
PCA: Finds linear projections that maximize variance
pyppur: Finds nonlinear projections that optimize specific objectives (distance/reconstruction)
pyppur vs t-SNE/UMAP¶
t-SNE/UMAP: Focus on local neighborhood preservation
pyppur: Optimizes global objectives with mathematical interpretability
Next Steps¶
Read the Mathematical Theory to understand the algorithms
Explore the Examples for more detailed use cases
Check the API Reference for complete parameter documentation