Running Multiple AI Agents in Parallel Without Destroying Your Codebase

Two AI agents writing to the same file simultaneously is a data corruption event. Here is the file-locking orchestration layer that makes parallel AI development safe and fast.

Published: May 25, 2026
Author: Hrushiekesh Kanjula Reddy
Read time: ~7 min
Category: engineering

#ai #multi-agent #engineering #architecture #langgraph #parallel-execution

Parallel AI agents with file-locking orchestration preventing write collisions

The promise of multi-agent AI development is straightforward: if one AI agent can implement a feature in 20 minutes, two agents working in parallel should be able to implement two features in 20 minutes. The reality is more complicated. Two agents writing to the same codebase simultaneously, with no coordination, will eventually write to the same file at the same time. When that happens, one agent's work overwrites the other's. You lose changes silently, or you get a merge conflict in the middle of an automated pipeline with no human to resolve it.

Parallel AI development is genuinely powerful. It requires explicit orchestration infrastructure to be safe. This post covers the file-locking protocol that makes it work.

The Collision Problem

Consider two agents working on the assembly hub simultaneously: Agent A is refactoring the BOM normalization engine, and Agent B is adding a new query method to the database access layer. These are independent tasks — they touch different modules, and there's no reason they can't proceed in parallel.

Except that both modules import from utils/constants.py. Agent A decides to add a new regex constant. Agent B decides to add a new SQL template string. Both read constants.py, make their additions, and write the file back. Whichever agent writes second overwrites the first agent's addition. The change is gone. No error. No warning. Just a silent overwrite.

This is not a hypothetical edge case. It's a predictable consequence of two write operations on the same file without coordination. At the scale of a real codebase — where shared utilities, base classes, and configuration files are imported everywhere — collisions happen constantly in naive parallel execution setups.

Silent overwrite: two agents reading then writing the same file with no coordination

The File-Locking Protocol

The solution is a lightweight file-locking system implemented as a central coordination layer. Before any agent writes to a file, it must acquire a lock on that file. If the file is already locked by another agent, the requesting agent receives a LOCKED response and is directed to wait and retry, or to work on a different task.

The lock registry is a simple SQLite table:

CREATE TABLE file_locks (
    file_path TEXT PRIMARY KEY,
    locked_by TEXT NOT NULL,        -- agent identifier
    locked_at TIMESTAMP NOT NULL,
    task_description TEXT,
    expires_at TIMESTAMP            -- safety timeout
);

Lock acquisition is a single atomic operation — SQLite's transaction isolation ensures that two agents cannot simultaneously acquire the same lock:

def acquire_lock(file_path: str, agent_id: str, task: str, timeout_minutes: int = 30) -> bool:
    with conn:
        try:
            conn.execute("""
                INSERT INTO file_locks 
                (file_path, locked_by, locked_at, task_description, expires_at)
                VALUES (?, ?, datetime('now'), ?, datetime('now', ? || ' minutes'))
            """, (file_path, agent_id, task, timeout_minutes))
            return True
        except sqlite3.IntegrityError:
            # PRIMARY KEY conflict — file already locked
            return False
 
def release_lock(file_path: str, agent_id: str) -> None:
    conn.execute(
        "DELETE FROM file_locks WHERE file_path = ? AND locked_by = ?",
        (file_path, agent_id)
    )
    conn.commit()

An agent that fails to acquire a lock is directed to consult the context archive and pick a different task from the backlog — one that doesn't require the locked file.

The Orchestration Layer

Above the lock registry sits an orchestration layer that assigns tasks to agents and manages the parallel execution flow. In the assembly hub's setup, this is implemented using a custom task queue backed by the same SQLite database, with a coordinator process that polls available agents and matches them to unlocked tasks.

Frameworks like LangGraph formalize this pattern — a directed graph where nodes are agent steps and edges represent task dependencies and coordination signals. The file-locking protocol slots naturally into a LangGraph workflow as a "guard" node that precedes any write operation.

LangGraph orchestration with file-lock guard nodes before write operations

The coordination flow for any write operation:

Agent proposes a file write to the coordinator
Coordinator attempts lock acquisition on behalf of the agent
If lock acquired: agent proceeds with write, releases lock on completion
If lock blocked: coordinator assigns agent a different task; retries lock acquisition after a configurable delay

The Context Archive Integration

The file-locking protocol integrates with the context archive from the governance framework. When Agent B cannot acquire a lock on constants.py because Agent A holds it, Agent B's redirect message includes the relevant section of the context archive: "Agent A is currently refactoring the normalization engine constants. Do not modify constants.py until lock released. See context archive section 4.2 for current normalization architecture."

This means agents don't just know that a file is locked — they know why it's locked and what architectural context is relevant to their own work. An agent that understands what Agent A is doing to constants.py can make better decisions about which alternative tasks to pursue, and can avoid making changes that would conflict with Agent A's in-progress work even in files that aren't currently locked.

Safety Timeouts and Lock Cleanup

Locks need safety timeouts. An agent that crashes mid-task, or gets stuck in a long reasoning loop, holds its locks indefinitely without a timeout mechanism. This blocks other agents from making progress on any of the locked files.

The expires_at column in the lock registry serves this purpose. A background process runs every 5 minutes and releases any locks whose expires_at has passed. When a lock expires, the partial work from the crashed agent is flagged for human review — the orchestration layer does not automatically reverse it, because the partial work may be partially correct and worth preserving.

Lock expiry and cleanup: background process releasing stale locks with partial-work flagging

What Parallel Execution Actually Looks Like

With the locking protocol in place, a typical parallel session on the assembly hub looks like this: Agent A works on the rotation validation improvements (locking rotation_engine.py, heuristics.py). Agent B works on a new AOI report parser (locking parsers/aoi.py, schemas/aoi_schema.py). Agent C works on an analytics query optimization (locking analytics/queries.py, db/views.sql). All three proceed simultaneously, with the coordinator handling any lock conflicts by redistributing tasks.

Wall-clock time for three independent features: roughly equal to implementing one feature serially. The theoretical 3x speedup doesn't fully materialize in practice — coordination overhead, the occasional lock conflict, and the planning steps eat some of it — but 2x to 2.5x is consistently achievable.

More importantly, the codebase stays coherent. Every write is coordinated. Every conflict is resolved before it becomes a silent data loss. The parallel execution delivers speed without sacrificing the architectural integrity that makes the codebase trustworthy.

That combination — fast and trustworthy — is what makes multi-agent development genuinely useful rather than just impressive to demo.

← All posts