The SQL Window Function That Saved Our Defect Attribution

SMT machines log generic errors milliseconds before they log the component that caused them. A forward-fill CTE using SQL window functions bridges that gap — here is exactly how it works.

Published: April 13, 2026
Author: Hrushiekesh Kanjula Reddy
Read time: ~7 min
Category: engineering

#sql #sqlite #window-functions #manufacturing #data-engineering #analytics

SQL window function forward-fill bridging generic errors to component identifiers

Here is a problem that sounds abstract until you hit it in production: your machine logs generic errors. Not "Component C47 failed vacuum pick" — just "Vacuum error on Head 3". The specific component identifier (C47) gets logged in the next event, 50 milliseconds later, when the machine registers what it was trying to pick when the error occurred.

A standard JOIN cannot link these two log entries. They're separate rows. The error row has a NULL component ID. The component row has no error code. A naive query attributes zero errors to specific components — which makes your analytics layer useless for root cause analysis.

The solution is a CTE using SQL window functions to perform a forward-fill — cascading the component identifier from the row where it appears backward to the error row that precedes it. Here is the full implementation.

Understanding the Log Structure

The machine error log table looks like this:

timestamp           | event_type        | component_ref | head_id | error_code
2026-04-13 14:32:01 | VACUUM_ERROR      | NULL          | 3       | E_VAC_LOW
2026-04-13 14:32:01 | COMPONENT_PICK    | C47           | 3       | NULL
2026-04-13 14:32:04 | VACUUM_ERROR      | NULL          | 3       | E_VAC_LOW
2026-04-13 14:32:04 | COMPONENT_PICK    | R12           | 3       | NULL

The VACUUM_ERROR rows have the error code but no component ref. The COMPONENT_PICK rows have the component ref but no error code. They're linked by their shared timestamp and head ID — the error happened while attempting to pick the component logged 50ms later.

Machine log event pairs: VACUUM_ERROR with NULL component followed by COMPONENT_PICK with component ID

The Forward-Fill CTE

The solution uses a Common Table Expression with LEAD() — a window function that looks ahead to the next row in a partition:

WITH enriched_log AS (
    SELECT
        timestamp,
        event_type,
        component_ref,
        head_id,
        error_code,
        -- Look ahead to the next row in this head's timeline
        LEAD(component_ref) OVER (
            PARTITION BY head_id 
            ORDER BY timestamp
        ) AS next_component_ref
    FROM machine_error_log
),
attributed_errors AS (
    SELECT
        timestamp,
        head_id,
        error_code,
        -- Use the component ref from this row if available,
        -- otherwise use the one from the next row
        COALESCE(component_ref, next_component_ref) AS attributed_component
    FROM enriched_log
    WHERE error_code IS NOT NULL
)
SELECT * FROM attributed_errors;

The LEAD() function, partitioned by head_id and ordered by timestamp, returns the value of component_ref from the immediately following row within the same head's event stream. COALESCE then selects the current row's component ref if it exists, and falls back to the next row's value if it doesn't.

LEAD() window function forward-filling component refs across the NULL error rows

The output is a clean table of error events, each attributed to the specific component that was being picked when the error occurred:

timestamp           | head_id | error_code | attributed_component
2026-04-13 14:32:01 | 3       | E_VAC_LOW  | C47
2026-04-13 14:32:04 | 3       | E_VAC_LOW  | R12

Handling Multi-Row Gaps

The LEAD(value, 1) default handles the case where the component ID appears exactly one row after the error. In some machine firmware versions, there are additional intermediate log entries between the error and the component pick — diagnostic codes, head position updates, retry attempts. A single LEAD call misses these.

The generalized solution uses a recursive CTE that cascades the fill forward across multiple NULL rows:

WITH RECURSIVE forward_fill AS (
    -- Base case: rows that have component_ref already
    SELECT 
        rowid, timestamp, head_id, event_type, 
        component_ref, error_code,
        component_ref AS filled_ref
    FROM machine_error_log
    
    UNION ALL
    
    -- Recursive case: carry filled_ref forward for NULL rows
    SELECT 
        mel.rowid, mel.timestamp, mel.head_id, mel.event_type,
        mel.component_ref, mel.error_code,
        COALESCE(mel.component_ref, ff.filled_ref)
    FROM machine_error_log mel
    JOIN forward_fill ff ON mel.rowid = ff.rowid + 1
                        AND mel.head_id = ff.head_id
    WHERE mel.component_ref IS NULL
)
SELECT * FROM forward_fill WHERE error_code IS NOT NULL;

The recursive version handles gaps of arbitrary length by carrying the filled_ref value forward until it encounters a row with an explicit component_ref that resets the chain.

Recursive CTE forward-fill for multi-row gaps in machine event logs

Performance Considerations

Window functions and recursive CTEs have real performance implications at scale. The machine error log for a busy production line grows by ~50,000 rows per shift. Running the attribution query over 30 days of historical data requires scanning several million rows.

Three optimizations keep this manageable:

Index on (head_id, timestamp). This composite index supports the PARTITION BY head_id ORDER BY timestamp clause directly, turning a full table scan into an index range scan.

Materialized intermediate results. For the recursive version, materializing the base case into a temporary table before running the recursion significantly reduces repeated work.

Incremental attribution. Rather than recomputing attribution for all historical data on each query, a background job runs the CTE nightly over only the previous day's logs and writes attributed results to a separate attributed_errors table. Dashboard queries hit the pre-attributed table — fast, simple reads.

What This Unlocks

With reliable defect attribution, queries that were previously impossible become routine. Which component reference has the most vacuum errors this week? Which head-component combination has the highest failure rate? Do errors cluster around specific pick sequences, suggesting a pick order optimization opportunity?

These questions require connecting errors to components. The forward-fill CTE is the infrastructure that makes the connection. It's one of the less glamorous parts of the assembly hub's analytics layer — and one of the most foundational.

← All posts