API Reference

Core

Core pipeline module.

class tether.core.BlockConfig(on, crosswalk=None)[source]

Bases: object

Configuration for blocking stage.

Parameters:
on: str | list[str]
crosswalk: Crosswalk | dict[str, str] | None = None
class tether.core.DecideConfig(method='hungarian')[source]

Bases: object

Configuration for decision stage.

Parameters:

method (Literal['hungarian', 'greedy', 'row_sequential']) – Decision algorithm to use.

method: Literal['hungarian', 'greedy', 'row_sequential'] = 'hungarian'
class tether.core.FilterConfig(min_score=0.0, margin=None)[source]

Bases: object

Configuration for filtering stage.

Parameters:
  • min_score (float) – Minimum score threshold.

  • margin (float | None) – Minimum margin for ambiguity removal.

min_score: float = 0.0
margin: float | None = None
class tether.core.LinkageResult(matches, diagnostics, left, right, candidate_pairs, filtered_pairs)[source]

Bases: object

Container for linkage results.

Parameters:
  • matches (DataFrame) – DataFrame with matched pairs.

  • diagnostics (LinkageDiagnostics) – Linkage diagnostics.

  • left (DataFrame) – Original left DataFrame.

  • right (DataFrame) – Original right DataFrame.

  • candidate_pairs (DataFrame) – Candidate pairs after blocking.

  • filtered_pairs (DataFrame) – Pairs after filtering.

matches: DataFrame
diagnostics: LinkageDiagnostics
left: DataFrame
right: DataFrame
candidate_pairs: DataFrame
filtered_pairs: DataFrame
inspect(margin_threshold=0.1)[source]

Generate an inspection report for this result.

Parameters:

margin_threshold (float) – Threshold for identifying ambiguous pairs.

Return type:

InspectionReport

Returns:

InspectionReport with detailed analysis.

merge_left(suffixes=('', '_matched'))[source]

Merge matches back to left DataFrame.

Parameters:

suffixes (tuple[str, str]) – Suffixes for overlapping columns.

Return type:

DataFrame

Returns:

Left DataFrame with matched right columns.

merge_right(suffixes=('_matched', ''))[source]

Merge matches back to right DataFrame.

Parameters:

suffixes (tuple[str, str]) – Suffixes for overlapping columns.

Return type:

DataFrame

Returns:

Right DataFrame with matched left columns.

class tether.core.Pipeline(preprocess_config, block_config, score_config, filter_config, decide_config)[source]

Bases: object

Executable linkage pipeline.

Parameters:

Execute the linkage pipeline.

Parameters:
  • left (DataFrame) – Left DataFrame to link.

  • right (DataFrame) – Right DataFrame to link.

Return type:

LinkageResult

Returns:

LinkageResult with matches and diagnostics.

class tether.core.PipelineBuilder[source]

Bases: object

Fluent builder for constructing linkage pipelines.

preprocess(normalize_unicode=True, lowercase=True, strip_whitespace=True, collapse_whitespace=True, columns=None)[source]

Configure preprocessing stage.

Parameters:
  • normalize_unicode (bool) – Normalize unicode characters.

  • lowercase (bool) – Convert to lowercase.

  • strip_whitespace (bool) – Strip whitespace.

  • collapse_whitespace (bool) – Collapse multiple whitespace.

  • columns (list[str] | None) – Columns to preprocess.

Return type:

Self

Returns:

Self for method chaining.

block(on, crosswalk=None)[source]

Configure blocking stage.

Parameters:
Return type:

Self

Returns:

Self for method chaining.

score(comparisons)[source]

Configure scoring stage.

Parameters:

comparisons (list[Comparison]) – List of comparison operations.

Return type:

Self

Returns:

Self for method chaining.

filter(min_score=0.0, margin=None)[source]

Configure filtering stage.

Parameters:
  • min_score (float) – Minimum score threshold.

  • margin (float | None) – Minimum margin for ambiguity removal.

Return type:

Self

Returns:

Self for method chaining.

decide(method='hungarian')[source]

Configure decision stage.

Parameters:

method (Literal['hungarian', 'greedy', 'row_sequential']) – Decision algorithm to use.

Return type:

Self

Returns:

Self for method chaining.

build()[source]

Build the configured pipeline.

Return type:

Pipeline

Returns:

Configured Pipeline instance.

Raises:

ValueError – If score configuration is missing.

class tether.core.PreprocessConfig(normalize_unicode=True, lowercase=True, strip_whitespace=True, collapse_whitespace=True, missing_policy='skip', columns=None)[source]

Bases: object

Configuration for preprocessing stage.

Parameters:
  • normalize_unicode (bool) – Whether to normalize unicode characters.

  • lowercase (bool) – Whether to convert to lowercase.

  • strip_whitespace (bool) – Whether to strip whitespace.

  • collapse_whitespace (bool) – Whether to collapse multiple whitespace.

  • missing_policy (Literal['skip', 'zero', 'penalize']) – How to handle missing values.

  • columns (list[str] | None) – Specific columns to preprocess.

normalize_unicode: bool = True
lowercase: bool = True
strip_whitespace: bool = True
collapse_whitespace: bool = True
missing_policy: Literal['skip', 'zero', 'penalize'] = 'skip'
columns: list[str] | None = None
class tether.core.ScoreConfig(comparisons=<factory>)[source]

Bases: object

Configuration for scoring stage.

Parameters:

comparisons (list[Comparison]) – List of comparison operations.

comparisons: list[Comparison]

Pipeline

class tether.core.pipeline.PipelineBuilder[source]

Bases: object

Fluent builder for constructing linkage pipelines.

preprocess(normalize_unicode=True, lowercase=True, strip_whitespace=True, collapse_whitespace=True, columns=None)[source]

Configure preprocessing stage.

Parameters:
  • normalize_unicode (bool) – Normalize unicode characters.

  • lowercase (bool) – Convert to lowercase.

  • strip_whitespace (bool) – Strip whitespace.

  • collapse_whitespace (bool) – Collapse multiple whitespace.

  • columns (list[str] | None) – Columns to preprocess.

Return type:

Self

Returns:

Self for method chaining.

block(on, crosswalk=None)[source]

Configure blocking stage.

Parameters:
Return type:

Self

Returns:

Self for method chaining.

score(comparisons)[source]

Configure scoring stage.

Parameters:

comparisons (list[Comparison]) – List of comparison operations.

Return type:

Self

Returns:

Self for method chaining.

filter(min_score=0.0, margin=None)[source]

Configure filtering stage.

Parameters:
  • min_score (float) – Minimum score threshold.

  • margin (float | None) – Minimum margin for ambiguity removal.

Return type:

Self

Returns:

Self for method chaining.

decide(method='hungarian')[source]

Configure decision stage.

Parameters:

method (Literal['hungarian', 'greedy', 'row_sequential']) – Decision algorithm to use.

Return type:

Self

Returns:

Self for method chaining.

build()[source]

Build the configured pipeline.

Return type:

Pipeline

Returns:

Configured Pipeline instance.

Raises:

ValueError – If score configuration is missing.

class tether.core.pipeline.Pipeline(preprocess_config, block_config, score_config, filter_config, decide_config)[source]

Bases: object

Executable linkage pipeline.

Parameters:

Execute the linkage pipeline.

Parameters:
  • left (DataFrame) – Left DataFrame to link.

  • right (DataFrame) – Right DataFrame to link.

Return type:

LinkageResult

Returns:

LinkageResult with matches and diagnostics.

Result

class tether.core.result.LinkageResult(matches, diagnostics, left, right, candidate_pairs, filtered_pairs)[source]

Bases: object

Container for linkage results.

Parameters:
  • matches (DataFrame) – DataFrame with matched pairs.

  • diagnostics (LinkageDiagnostics) – Linkage diagnostics.

  • left (DataFrame) – Original left DataFrame.

  • right (DataFrame) – Original right DataFrame.

  • candidate_pairs (DataFrame) – Candidate pairs after blocking.

  • filtered_pairs (DataFrame) – Pairs after filtering.

matches: DataFrame
diagnostics: LinkageDiagnostics
left: DataFrame
right: DataFrame
candidate_pairs: DataFrame
filtered_pairs: DataFrame
inspect(margin_threshold=0.1)[source]

Generate an inspection report for this result.

Parameters:

margin_threshold (float) – Threshold for identifying ambiguous pairs.

Return type:

InspectionReport

Returns:

InspectionReport with detailed analysis.

merge_left(suffixes=('', '_matched'))[source]

Merge matches back to left DataFrame.

Parameters:

suffixes (tuple[str, str]) – Suffixes for overlapping columns.

Return type:

DataFrame

Returns:

Left DataFrame with matched right columns.

merge_right(suffixes=('_matched', ''))[source]

Merge matches back to right DataFrame.

Parameters:

suffixes (tuple[str, str]) – Suffixes for overlapping columns.

Return type:

DataFrame

Returns:

Right DataFrame with matched left columns.

Score

Scoring module for pairwise comparisons.

class tether.score.Comparison(*args, **kwargs)[source]

Bases: Protocol

Protocol for field comparison operations.

column: str
weight: float
compare(left, right)[source]

Compare two series and return similarity scores.

Parameters:
  • left (Series) – Left series of values.

  • right (Series) – Right series of values.

Return type:

Series

Returns:

Series of similarity scores between 0 and 1.

class tether.score.DateComparison(column, tolerance_days=0, weight=1.0)[source]

Bases: object

Date comparison with day tolerance.

Parameters:
  • column (str) – Column name to compare.

  • tolerance_days (int) – Maximum allowed difference in days.

  • weight (float) – Weight for this comparison in aggregate score.

column: str
tolerance_days: int
weight: float
compare(left, right)[source]

Compare date values with day tolerance.

Parameters:
  • left (Series) – Left series of date values.

  • right (Series) – Right series of date values.

Return type:

Series

Returns:

Series of similarity scores between 0 and 1.

class tether.score.ExactComparison(column, weight=1.0)[source]

Bases: object

Exact match comparison.

Parameters:
  • column (str) – Column name to compare.

  • weight (float) – Weight for this comparison in aggregate score.

column: str
weight: float
compare(left, right)[source]

Compare values for exact equality.

Parameters:
  • left (Series) – Left series of values.

  • right (Series) – Right series of values.

Return type:

Series

Returns:

Series of 1.0 for matches, 0.0 for non-matches.

class tether.score.NumericComparison(column, tolerance=0.0, weight=1.0, scale='linear')[source]

Bases: object

Numeric comparison with tolerance.

Parameters:
  • column (str) – Column name to compare.

  • tolerance (float) – Maximum allowed difference for a match.

  • weight (float) – Weight for this comparison in aggregate score.

  • scale (Literal['linear', 'gaussian'])

column: str
tolerance: float
weight: float
scale: Literal['linear', 'gaussian']
compare(left, right)[source]

Compare numeric values with tolerance.

Parameters:
  • left (Series) – Left series of numeric values.

  • right (Series) – Right series of numeric values.

Return type:

Series

Returns:

Series of similarity scores between 0 and 1.

class tether.score.PairwiseScorer(comparisons)[source]

Bases: object

Compute pairwise similarity scores for candidate pairs.

Parameters:

comparisons (list[Comparison])

score(pairs)[source]

Compute similarity scores for candidate pairs.

Parameters:

pairs (DataFrame) – DataFrame with candidate pairs containing columns from both left and right DataFrames with _left and _right suffixes.

Return type:

DataFrame

Returns:

DataFrame with original pairs plus score columns and aggregate score.

class tether.score.StringComparison(column, algorithm='jaro_winkler', weight=1.0)[source]

Bases: object

String similarity comparison using fuzzy matching algorithms.

Parameters:
  • column (str) – Column name to compare.

  • algorithm (Literal['jaro_winkler', 'levenshtein', 'damerau_levenshtein']) – Similarity algorithm to use.

  • weight (float) – Weight for this comparison in aggregate score.

column: str
algorithm: Literal['jaro_winkler', 'levenshtein', 'damerau_levenshtein']
weight: float
compare(left, right)[source]

Compare string values using the configured algorithm.

Parameters:
  • left (Series) – Left series of string values.

  • right (Series) – Right series of string values.

Return type:

Series

Returns:

Series of similarity scores between 0 and 1.

Comparisons

class tether.score.comparisons.StringComparison(column, algorithm='jaro_winkler', weight=1.0)[source]

Bases: object

String similarity comparison using fuzzy matching algorithms.

Parameters:
  • column (str) – Column name to compare.

  • algorithm (Literal['jaro_winkler', 'levenshtein', 'damerau_levenshtein']) – Similarity algorithm to use.

  • weight (float) – Weight for this comparison in aggregate score.

column: str
algorithm: Literal['jaro_winkler', 'levenshtein', 'damerau_levenshtein']
weight: float
compare(left, right)[source]

Compare string values using the configured algorithm.

Parameters:
  • left (Series) – Left series of string values.

  • right (Series) – Right series of string values.

Return type:

Series

Returns:

Series of similarity scores between 0 and 1.

class tether.score.comparisons.ExactComparison(column, weight=1.0)[source]

Bases: object

Exact match comparison.

Parameters:
  • column (str) – Column name to compare.

  • weight (float) – Weight for this comparison in aggregate score.

column: str
weight: float
compare(left, right)[source]

Compare values for exact equality.

Parameters:
  • left (Series) – Left series of values.

  • right (Series) – Right series of values.

Return type:

Series

Returns:

Series of 1.0 for matches, 0.0 for non-matches.

class tether.score.comparisons.NumericComparison(column, tolerance=0.0, weight=1.0, scale='linear')[source]

Bases: object

Numeric comparison with tolerance.

Parameters:
  • column (str) – Column name to compare.

  • tolerance (float) – Maximum allowed difference for a match.

  • weight (float) – Weight for this comparison in aggregate score.

  • scale (Literal['linear', 'gaussian'])

column: str
tolerance: float
weight: float
scale: Literal['linear', 'gaussian']
compare(left, right)[source]

Compare numeric values with tolerance.

Parameters:
  • left (Series) – Left series of numeric values.

  • right (Series) – Right series of numeric values.

Return type:

Series

Returns:

Series of similarity scores between 0 and 1.

class tether.score.comparisons.DateComparison(column, tolerance_days=0, weight=1.0)[source]

Bases: object

Date comparison with day tolerance.

Parameters:
  • column (str) – Column name to compare.

  • tolerance_days (int) – Maximum allowed difference in days.

  • weight (float) – Weight for this comparison in aggregate score.

column: str
tolerance_days: int
weight: float
compare(left, right)[source]

Compare date values with day tolerance.

Parameters:
  • left (Series) – Left series of date values.

  • right (Series) – Right series of date values.

Return type:

Series

Returns:

Series of similarity scores between 0 and 1.

Block

Blocking module for reducing comparison space.

class tether.block.Blocker(*args, **kwargs)[source]

Bases: Protocol

Protocol for blocking strategies.

block(left, right)[source]

Generate candidate pairs from two DataFrames.

Parameters:
Return type:

DataFrame

Returns:

DataFrame with candidate pairs containing columns from both DataFrames with _left and _right suffixes.

class tether.block.Crosswalk(mapping)[source]

Bases: object

Mapping between blocking key values.

Parameters:

mapping (dict[str, str])

apply(series)[source]

Apply crosswalk mapping to a series.

Parameters:

series (Series) – Series of values to normalize.

Return type:

Series

Returns:

Series with mapped values.

validate()[source]

Validate the crosswalk mapping.

Return type:

list[str]

Returns:

List of validation error messages.

class tether.block.FieldBlocker(on, crosswalk=None)[source]

Bases: object

Block on one or more fields with optional crosswalk.

Parameters:
block(left, right)[source]

Generate candidate pairs by blocking on specified fields.

Parameters:
Return type:

DataFrame

Returns:

DataFrame with candidate pairs.

class tether.block.FullBlocker[source]

Bases: object

Generate all possible pairs (no blocking).

Use with caution - creates n*m pairs.

block(left, right)[source]

Generate all possible pairs.

Parameters:
Return type:

DataFrame

Returns:

DataFrame with all pairs.

Decide

Decision rule implementations.

class tether.decide.DecisionRule(*args, **kwargs)[source]

Bases: Protocol

Protocol for matching decision algorithms.

decide(scored_pairs)[source]

Select final matches from scored candidate pairs.

Parameters:

scored_pairs (DataFrame) – DataFrame with candidate pairs and scores. Must contain ‘left_index’, ‘right_index’, and ‘score’ columns.

Return type:

DataFrame

Returns:

DataFrame with selected matches.

class tether.decide.GreedyDecision[source]

Bases: object

Greedy matching selecting best global pair first.

Iteratively selects the highest-scoring unmatched pair until no valid pairs remain.

decide(scored_pairs)[source]

Select matches using greedy best-first approach.

Parameters:

scored_pairs (DataFrame) – DataFrame with ‘left_index’, ‘right_index’, and ‘score’.

Return type:

DataFrame

Returns:

DataFrame with greedy matches.

class tether.decide.HungarianDecision[source]

Bases: object

Optimal assignment using the Hungarian algorithm.

Maximizes total matching score while ensuring each record is matched at most once.

decide(scored_pairs)[source]

Select optimal matches using Hungarian algorithm.

Parameters:

scored_pairs (DataFrame) – DataFrame with ‘left_index’, ‘right_index’, and ‘score’.

Return type:

DataFrame

Returns:

DataFrame with optimal matches.

class tether.decide.RowSequentialDecision[source]

Bases: object

Row-sequential matching processing left records in order.

For each left record (in index order), selects the best available right record. Simple baseline algorithm.

decide(scored_pairs)[source]

Select matches processing left records sequentially.

Parameters:

scored_pairs (DataFrame) – DataFrame with ‘left_index’, ‘right_index’, and ‘score’.

Return type:

DataFrame

Returns:

DataFrame with row-sequential matches.

Filter

Filtering module for removing low-quality pairs.

class tether.filter.Filter(*args, **kwargs)[source]

Bases: Protocol

Protocol for pair filtering strategies.

filter(pairs)[source]

Filter candidate pairs.

Parameters:

pairs (DataFrame) – DataFrame with candidate pairs and scores.

Return type:

DataFrame

Returns:

Filtered DataFrame.

class tether.filter.MarginFilter(margin)[source]

Bases: object

Remove ambiguous matches based on score margin.

For each left record, removes the best match if the margin to the second-best match is below the threshold.

Parameters:

margin (float)

filter(pairs)[source]

Remove ambiguous matches.

Parameters:

pairs (DataFrame) – DataFrame with ‘left_index’ and ‘score’ columns.

Return type:

DataFrame

Returns:

Filtered DataFrame with unambiguous matches.

class tether.filter.ThresholdFilter(min_score)[source]

Bases: object

Filter pairs below a minimum score threshold.

Parameters:

min_score (float)

filter(pairs)[source]

Remove pairs below the threshold.

Parameters:

pairs (DataFrame) – DataFrame with ‘score’ column.

Return type:

DataFrame

Returns:

Filtered DataFrame with pairs meeting threshold.

Preprocess

Preprocessing module for data normalization.

class tether.preprocess.MissingHandler(policy='skip', fill_value='', columns=None)[source]

Bases: object

Handle missing values in DataFrames.

Parameters:
  • policy (Literal['skip', 'zero', 'penalize'])

  • fill_value (str)

  • columns (list[str] | None)

preprocess(df)[source]

Handle missing values in the DataFrame.

Parameters:

df (DataFrame) – Input DataFrame.

Return type:

DataFrame

Returns:

DataFrame with missing values handled.

class tether.preprocess.Preprocessor(*args, **kwargs)[source]

Bases: Protocol

Protocol for preprocessing operations.

preprocess(df)[source]

Preprocess a DataFrame.

Parameters:

df (DataFrame) – Input DataFrame.

Return type:

DataFrame

Returns:

Preprocessed DataFrame.

class tether.preprocess.TextNormalizer(normalize_unicode=True, lowercase=True, strip_whitespace=True, collapse_whitespace=True, columns=None)[source]

Bases: object

Normalize text columns in a DataFrame.

Parameters:
  • normalize_unicode (bool)

  • lowercase (bool)

  • strip_whitespace (bool)

  • collapse_whitespace (bool)

  • columns (list[str] | None)

preprocess(df)[source]

Normalize text columns in the DataFrame.

Parameters:

df (DataFrame) – Input DataFrame.

Return type:

DataFrame

Returns:

DataFrame with normalized text columns.

Deduplicate

Deduplication module for within-table duplicate removal.

class tether.deduplicate.ClusterDeduplicator(comparisons, threshold=0.9, margin=0.1, timestamp_column=None, block_on=None)[source]

Bases: object

Remove within-table duplicates using connected components.

Parameters:
deduplicate(df)[source]

Remove duplicate records using connected components.

Parameters:

df (DataFrame) – Input DataFrame.

Return type:

tuple[DataFrame, DeduplicationReport]

Returns:

Tuple of (deduplicated DataFrame, deduplication report).

class tether.deduplicate.DeduplicationReport(original_count, kept_count, dropped_as_duplicate, dropped_as_indistinguishable, groups_found, largest_group_size)[source]

Bases: object

Report on deduplication results.

Parameters:
  • original_count (int)

  • kept_count (int)

  • dropped_as_duplicate (int)

  • dropped_as_indistinguishable (int)

  • groups_found (int)

  • largest_group_size (int)

original_count: int
kept_count: int
dropped_as_duplicate: int
dropped_as_indistinguishable: int
groups_found: int
largest_group_size: int
class tether.deduplicate.Deduplicator(*args, **kwargs)[source]

Bases: Protocol

Protocol for within-table deduplication.

deduplicate(df)[source]

Remove duplicate records from a DataFrame.

Parameters:

df (DataFrame) – Input DataFrame.

Return type:

tuple[DataFrame, DeduplicationReport]

Returns:

Tuple of (deduplicated DataFrame, deduplication report).

class tether.deduplicate.ExactDeduplicator(columns=None, keep='first')[source]

Bases: object

Remove exact duplicates based on specified columns.

Parameters:
keep: Literal['first', 'last']
deduplicate(df)[source]

Remove exact duplicates.

Parameters:

df (DataFrame) – Input DataFrame.

Return type:

tuple[DataFrame, DeduplicationReport]

Returns:

Tuple of (deduplicated DataFrame, deduplication report).

Inspect

Inspection module for linkage diagnostics and reports.

class tether.inspect.InspectionReport(diagnostics, ambiguous_pairs=<factory>, unmatched_left=<factory>, unmatched_right=<factory>)[source]

Bases: object

Detailed inspection report for linkage results.

Parameters:
  • diagnostics (LinkageDiagnostics) – Linkage diagnostics.

  • ambiguous_pairs (DataFrame) – Pairs with close scores that may be ambiguous.

  • unmatched_left (DataFrame) – Left records that were not matched.

  • unmatched_right (DataFrame) – Right records that were not matched.

diagnostics: LinkageDiagnostics
ambiguous_pairs: DataFrame
unmatched_left: DataFrame
unmatched_right: DataFrame
summary()[source]

Generate a text summary of the report.

Return type:

str

Returns:

Human-readable summary string.

class tether.inspect.LinkageDiagnostics(n_left, n_right, n_candidate_pairs, n_filtered_pairs, n_matches, match_rate_left, match_rate_right, score_stats)[source]

Bases: object

Diagnostic statistics for linkage results.

Parameters:
  • n_left (int) – Number of records in left DataFrame.

  • n_right (int) – Number of records in right DataFrame.

  • n_candidate_pairs (int) – Number of candidate pairs after blocking.

  • n_filtered_pairs (int) – Number of pairs after filtering.

  • n_matches (int) – Number of final matches.

  • match_rate_left (float) – Proportion of left records matched.

  • match_rate_right (float) – Proportion of right records matched.

  • score_stats (dict[str, float]) – Score distribution statistics.

n_left: int
n_right: int
n_candidate_pairs: int
n_filtered_pairs: int
n_matches: int
match_rate_left: float
match_rate_right: float
score_stats: dict[str, float]
tether.inspect.compute_diagnostics(left, right, candidate_pairs, filtered_pairs, matches)[source]

Compute diagnostic statistics for linkage results.

Parameters:
  • left (DataFrame) – Left DataFrame.

  • right (DataFrame) – Right DataFrame.

  • candidate_pairs (DataFrame) – Candidate pairs after blocking.

  • filtered_pairs (DataFrame) – Pairs after filtering.

  • matches (DataFrame) – Final matches.

Return type:

LinkageDiagnostics

Returns:

LinkageDiagnostics with computed statistics.

tether.inspect.generate_report(left, right, matches, diagnostics, filtered_pairs, margin_threshold=0.1)[source]

Generate an inspection report for linkage results.

Parameters:
  • left (DataFrame) – Left DataFrame.

  • right (DataFrame) – Right DataFrame.

  • matches (DataFrame) – Final matches.

  • diagnostics (LinkageDiagnostics) – Linkage diagnostics.

  • filtered_pairs (DataFrame) – Pairs after filtering.

  • margin_threshold (float) – Threshold for identifying ambiguous pairs.

Return type:

InspectionReport

Returns:

InspectionReport with detailed analysis.

Multipass

Multi-pass linkage module.

class tether.multipass.MultiPassOrchestrator[source]

Bases: object

Orchestrate multi-pass record linkage.

Runs multiple passes with progressively relaxed thresholds, removing matched records between passes for higher precision.

run(left, right, passes, comparisons, block_on=None, crosswalk=None, preprocess=True)[source]

Execute multi-pass linkage.

Parameters:
Return type:

LinkageResult

Returns:

Combined LinkageResult from all passes.

class tether.multipass.PassConfig(min_score, method='hungarian', margin=None)[source]

Bases: object

Configuration for a single pass in multi-pass matching.

Parameters:
  • min_score (float) – Minimum score threshold for this pass.

  • method (Literal['hungarian', 'greedy', 'row_sequential']) – Decision method for this pass.

  • margin (float | None) – Optional margin filter for this pass.

min_score: float
method: Literal['hungarian', 'greedy', 'row_sequential'] = 'hungarian'
margin: float | None = None
tether.multipass.precision_first(threshold=0.9)[source]

Create a precision-first single-pass strategy.

Parameters:

threshold (float) – High threshold for precision.

Return type:

list[PassConfig]

Returns:

Single PassConfig list for high-precision matching.

tether.multipass.strict_then_relaxed(strict_threshold=0.95, medium_threshold=0.85, relaxed_threshold=0.7)[source]

Create a strict-then-relaxed multi-pass strategy.

Parameters:
  • strict_threshold (float) – Threshold for first strict pass.

  • medium_threshold (float) – Threshold for medium pass.

  • relaxed_threshold (float) – Threshold for final relaxed pass.

Return type:

list[PassConfig]

Returns:

List of PassConfig for multi-pass matching.