Skip to content
Hrushiekesh Reddy Kanjula
AOI inspection report data being fuzzy-matched against a component library database

Fuzzy Matching AOI Defect Data to Your Component Library

AOI machines generate defect reports with part numbers that almost match your library. Here is the dynamic parsing and fuzzy matching system that bridges that gap.

Published
March 23, 2026
Author
Hrushiekesh Kanjula Reddy
Read time
~6 min
Category
engineering

Fuzzy matching AOI inspection data to a canonical component library

Automated Optical Inspection (AOI) is the final gate before a PCB ships. The machine photographs every component placement, compares it against the expected geometry, and flags anything that looks wrong — a rotated capacitor, a missing resistor, a misaligned IC. It generates a detailed defect report for every board.

The problem is that AOI defect reports don't know about your internal component library. The machine logs the part number exactly as it appears in the placement program — which might be "C0402-100N-10V", while your canonical library entry for the same component is "CAP|100N|10%|-|0402|10V|X7R". These need to be linked for the defect data to be useful. Linking them requires fuzzy matching — and doing it reliably at scale requires more than calling fuzz.ratio() and hoping for the best.

The Dynamic Column Parser

Before fuzzy matching anything, the AOI report has to be parsed. AOI outputs are notoriously inconsistent across machine vendors and firmware versions. The column order varies. Header names change between software updates. Some machines export CSV; others export a proprietary tab-delimited format with embedded metadata.

The dynamic column parser evaluates each column using two signals: the header string and the data content patterns. A column where every value matches [A-Z]\d{1,4} (like C101, R47, U3) is almost certainly the Reference Designator column regardless of what the header says. A column of alphanumeric strings like "0402", "SOT-23" is the Package column. This content-based inference means the parser works even when the AOI vendor updates their software and renames all the columns.

def infer_column_role(series: pd.Series, header: str) -> str:
    # Header name match first
    header_map = {"ref": "reference", "designator": "reference",
                  "part": "part_number", "defect": "defect_type"}
    for keyword, role in header_map.items():
        if keyword in header.lower():
            return role
    
    # Content pattern inference
    ref_pattern = re.compile(r'^[A-Z]{1,2}\d{1,4}$')
    if series.dropna().apply(lambda x: bool(ref_pattern.match(str(x)))).mean() > 0.8:
        return "reference"
    
    return "unknown"

Dynamic column role inference from content patterns

Building the Fuzzy Match Layer

With the AOI report parsed into a structured dataframe, each row's part number needs to be matched against the canonical component library. The matching strategy uses three tiers, applied in order from most to least confident:

Tier 1 — Exact match. After normalizing both strings (lowercase, strip whitespace, standardize separators), try a direct dictionary lookup. Exact matches are fast and unambiguous. In practice, about 60% of AOI part numbers match exactly after normalization.

Tier 2 — Token sort ratio. For non-exact matches, compute the token sort ratio using RapidFuzz. Token sort ratio reorders the words in both strings alphabetically before comparing — so "10K RES 0402" and "RES 0402 10K" score 100 even though character-by-character they're different. This handles the majority of manufacturer prefix variations and field order differences.

Tier 3 — Partial token match. For cases where the AOI part number is a subset or superset of the library entry (common with manufacturer part numbers that embed dimensional codes), partial token matching identifies the best candidate within the correct component family. A minimum threshold of 85 prevents false positive matches.

from rapidfuzz import fuzz, process
 
def match_part_number(aoi_pn: str, library: dict) -> tuple[str, float]:
    normalized = normalize_part_number(aoi_pn)
    
    # Tier 1: exact
    if normalized in library:
        return library[normalized], 100.0
    
    # Tier 2: token sort
    candidates = process.extractOne(
        normalized, library.keys(),
        scorer=fuzz.token_sort_ratio,
        score_cutoff=85
    )
    if candidates:
        return library[candidates[0]], candidates[1]
    
    return None, 0.0

Three-tier fuzzy matching hierarchy for AOI part numbers

The Normalization Pre-Pass

Fuzzy matching works better when both sides are pre-normalized. Before any comparison, both the AOI part number and the library key go through a normalization pass that handles the most common variations:

  • Remove manufacturer prefixes ("TI-", "MFR:", "PN:")
  • Standardize separators (hyphen, underscore, slash → single hyphen)
  • Strip trailing qualifiers ("-T", "-TL", "-REEL" for tape-and-reel packaging)
  • Uppercase everything

These transformations dramatically improve match rates at tiers 1 and 2, reducing the number of cases that fall through to tier 3 or return no match.

Handling No-Match Cases

When all three tiers fail to find a confident match, the defect record is flagged as "unresolved" and enters a manual review queue. The queue UI shows the AOI part number alongside the top five library candidates ranked by similarity score, with color-coded confidence bands. Engineers confirm or correct the match with one click.

Each confirmed correction is written back to a custom match table — a persistent mapping from {aoi_part_number → library_id} that short-circuits future lookups for the same string. The first time you see "VISHAY-10K-0402", it takes a human to resolve it. Every time after that, the lookup table returns the correct library entry instantly.

Manual review queue with top-5 candidates and confidence scores

The Output: Defect Records With Component Context

Once every AOI defect record is linked to a canonical library entry, the data becomes genuinely useful. Defects can be grouped by component family, package size, or placement head. A defect that was previously just "Tombstone on C0402-100N-10V at R3C5" becomes "Tombstone on CAP|100N|10%|0402 — matched to library ID 4821, placed by Head 3 Nozzle A at 14:32:07".

That linkage is what turns AOI output from an inspection log into a feedback loop for production quality improvement. The fuzzy matching infrastructure is the bridge that makes it possible.