What an AI-Native Repository Actually Looks Like

Most content about "AI-assisted coding" is about autocomplete. Tab-complete a function, accept a suggestion, move on. That's useful, but it's not what I'm talking about here.

I'm talking about treating your repository as an operating interface for an AI agent — where the agent doesn't just write code, but deploys it, monitors it, tests it, queries live systems, and operates infrastructure. I built a real-time arbitrage bot for Polymarket this way. 223 commits. ~15,000 lines of Python and Rust. Multi-connection WebSocket architecture processing 4,000 messages per second. A custom Rust FFI module for cryptographic signing at 100x the speed of Python. All in about 8 days of active development.

The bot itself is interesting, but that's not the point of this post. The point is what the repository looks like — and why that structure is what made 8 days possible.

Three Layers of an AI-Native Repo

A traditional repo has code, tests, and maybe a README. An AI-native repo has three additional layers that turn it into something an agent can operate:

1. Instructions — a CLAUDE.md file that tells the agent how to behave in this codebase.

2. Commands — markdown files in .claude/commands/ that define reusable operations the agent can execute.

3. Tools — MCP servers that give the agent structured access to live systems.

Here's what each looks like in practice.

Instructions: Less Is More

My CLAUDE.md is four lines:

All polymarket documentation and api-reference can be found
in folder polymarket-documentation.

Python style guide:
    - no relative imports allowed, style is: import module
    - private functions start with _; public functions are at top
    - no excessive indentation
    - async functions when needed

That's it. No architecture overview, no module descriptions, no flowcharts. The agent discovers architecture by reading the code — just like a senior developer would. What it needs from me is the stuff it can't infer: where the API docs live, and the handful of style conventions I care about.

I've seen people write 200-line CLAUDE.md files that try to explain everything. That's counterproductive. The agent doesn't need a tour guide. It needs guardrails and pointers.

22 Commands as a Development API

The .claude/commands/ directory contains 22 markdown files. Each one is a mini-specification for a reusable operation. They fall into four categories:

Deployment — deploy-arbitrage, deploy-monitor, deploy-trade-analysis, stop-arbitrage. These handle SSH, Docker builds, and environment configuration.

Querying — poly-balance, poly-positions, poly-orders, poly-portfolio-value, poly-activity, poly-trades. These call MCP tools to query live Polymarket data.

Testing — poly-test-arbitrage supports four modes: local functional tests, local performance tests, distributed tests with a fake server on a VPS, and full distributed tests with both fake server and bot on separate VPSes.

Analytics — trade-activity, trade-arb-analysis, compare-rn1. These analyze trade timing, arbitrage opportunities, and competitive behavior.

Here's what a command looks like. This is poly-balance.md in its entirety:

Call the Polymarket MCP tool `mcp__polymarket-monitor__get_balance`
with account="default" to fetch and display USDC balance and
allowance data.

IMPORTANT: The balance is returned as a raw integer string.
USDC has 6 decimals.
To convert to dollars: divide by 1,000,000 (10^6).

Example: balance "1895522" = 1895522 / 1000000 = $1.89 USDC

Seven lines. The agent knows exactly what to call, how to interpret the result, and what gotcha to watch for (USDC decimals). No ambiguity.

Compare that with the testing command, which handles complex multi-VPS deployment:

Parse the arguments from $ARGUMENTS:
- No args or "local": Run local functional tests
- "performance" or "perf": Run local performance tests
- "distributed" or "dist": Run distributed performance tests
- "distributed deploy": Deploy fake server to VPS, then run tests
- "distributed full": Deploy both fake server and bot to VPSes
- "stop": Stop all deployed services

The key insight: commands are composable. In a single conversation, I can say "check my balance, deploy to prod, tail the logs, wait two minutes, then run the distributed performance test." The agent chains these operations together because each one is self-contained and unambiguous. That's not autocomplete — that's an operating system.

MCP Servers as the Agent's Senses

The repo includes two MCP servers that give the agent structured access to live Polymarket data:

polymarket-monitor exposes 11 tools:

Tool	What it does
`get_positions`	Current positions with P&L
`get_orders`	Open orders with status
`get_balance`	USDC balance and allowance
`get_portfolio_value`	Cash + positions + unrealized P&L
`search_markets`	Find markets by liquidity, volume, status
`search_events`	Query events with tag filtering
`cancel_all_orders`	Cancel all open orders
`redeem_positions`	Redeem winners, burn losers
`merge_balanced_positions`	Convert equal YES+NO shares to USDC
`get_activity`	Account activity history
`get_trades`	Recent trades

trade-analysis exposes 2 tools for post-trade analysis: analyze_trade_activity (timeline of nearby trading around my fills) and analyze_arbitrage (arbitrage opportunity analysis over configurable time windows).

Without these MCP servers, the agent can only read and write code. With them, it can answer questions like "what's my portfolio worth right now?", "who else was trading on the same market within 50ms of my fill?", and "merge all my balanced positions into cash."

The permission boundary matters too. The .claude/settings.local.json whitelists exactly which MCP tools the agent can call:

{
  "permissions": {
    "allow": [
      "mcp__polymarket-monitor__get_positions",
      "mcp__polymarket-monitor__get_balance",
      "mcp__polymarket-monitor__get_portfolio_value",
      "mcp__polymarket-monitor__merge_balanced_positions",
      "mcp__trade-analysis__analyze_trade_activity",
      ...
    ]
  }
}

The agent can query and merge — but it can't place orders. That's a deliberate boundary. You want your agent to have eyes and hands, but not the ability to spend money without explicit approval.

Where the Human Was Irreplaceable

Eight days of AI-assisted development doesn't mean the AI did everything. It means the AI handled the volume while I handled the direction. The clearest example is what I internally called the "GAMECHANGER" — a root cause analysis that changed the entire architecture.

The bot was detecting arbitrage opportunities correctly but failing to fill them 75% of the time. Orders were going "live" (resting on the order book) instead of "matched" (filling immediately). The obvious hypothesis was speed — we needed to detect and execute faster.

That hypothesis was wrong.

The real problem was data freshness. During high-volume spikes, the WebSocket message queue would back up to 2,000+ messages. By the time we processed a price update, it was 500ms stale. We'd place an order at a price that no longer existed, and it would sit on the book competing against orders placed hours ago. We couldn't win on time priority against pre-existing depth.

The fix wasn't faster execution — it was keeping the queue shallow:

# Early exit: skip if no arbitrage possible (eliminates ~99% of checks)
if best_ask + pair_price >= config.ARB_THRESHOLD:
    continue

# Check before locking: pure function arb check, then lock only if needed
opportunity = detect_arbitrage(asset_id)  # No lock
if not opportunity:
    return
async with lock:  # Only lock when there's something to execute
    ...

The early exit alone eliminated 99% of unnecessary work. Combined with lock-free detection and sampled diagnostics (tracking only 1% of messages instead of all of them), the queue depth dropped from 2,059 to under 100.

The AI agent implemented all of these fixes. It wrote the code, ran the tests, deployed to the VPS. But it didn't — and couldn't — identify the root cause. The insight that "orders go live because data is stale, not because we're slow" required understanding how order book matching works, what time priority means in a continuous limit order book, and why a 500ms delay is fatal in that context. That's domain knowledge combined with architectural intuition. The AI had neither.

This is the actual division of labor in AI-native development: the human identifies what's wrong and why; the agent implements the fix at speed.

Dropping Into Rust

The other human decision that shaped the project was going to Rust for cryptographic signing.

Every order placed on Polymarket requires an EIP-712 signature — ECDSA signing with Keccak256 hashing over ABI-encoded order data. In Python, this takes 4–5ms per signature. That's the entire latency budget for an arbitrage order. Two orders (one YES, one NO) would take 8–10ms just for signing — before any network I/O.

The solution was a Rust module compiled to a Python extension via PyO3. The hot-path function sign_order does everything in Rust: ABI encoding 12 fields into 384 bytes, Keccak256 hashing with a pre-computed ORDER_TYPE_HASH, and secp256k1 ECDSA signing.

// Pre-computed at compile time — no runtime hashing
const ORDER_TYPE_HASH: [u8; 32] = [
    0xa8, 0x52, 0x56, 0x6c, 0x4e, 0x14, 0xd0, 0x08,
    0x69, 0xb6, 0xdb, 0x02, 0x20, 0x88, 0x8a, 0x90,
    // ...
];

#[pyfunction]
fn sign_order(
    private_key_hex: &str,
    domain_separator: &[u8],
    salt: u64,
    maker: &str, signer: &str, taker: &str,
    token_id: &str, maker_amount: &str, taker_amount: &str,
    expiration: u64, nonce: u64, fee_rate_bps: u64,
    side: u8, signature_type: u8,
) -> PyResult<String> {
    let encoded = encode_order(salt, maker, signer, taker, ...)?;

    let mut hasher = Keccak256::new();
    hasher.update(ORDER_TYPE_HASH);
    hasher.update(&encoded);
    let struct_hash = hasher.finalize();

    sign_typed_data_to_hex(private_key_hex, domain_separator, &struct_hash)
}

Result: 0.04ms per signature — roughly 100x faster than Python. No GIL contention. The signing is now invisible in the latency budget.

The AI agent wrote most of this Rust code. It handled the PyO3 bindings, the ABI encoding, the test vectors. But the decision to go to Rust — that was mine. I profiled the hot path, identified signing as the bottleneck, evaluated alternatives (C extension, pre-signing, batching), and decided Rust + PyO3 was the right tradeoff. The agent executed the decision at speed. It produced ~540 lines of well-tested Rust in maybe an hour.

What 8 Days Produces

Here's what the repository looks like after 8 days of AI-native development:

Metric	Value
Commits	223
Python modules	20 (arbitrage) + 8 (monitoring) + 7 (analysis)
Rust code	540 lines (cryptographic signer)
CLI commands	22
MCP tools	13 (11 monitor + 2 analysis)
WebSocket throughput	~4,000 messages/second
Parallel workers	8
Market coverage	2,000+ assets across multiple WebSocket connections
Integration test modes	4 (local, performance, distributed, full distributed)
Deployment targets	2 VPS instances via Docker

This isn't a toy project. It has integration tests, Docker deployment, PowerShell scripts for cross-platform builds, rotating log files, dry-run mode, spend limiters, clock synchronization via NTP, and position recovery from incomplete fills.

Could I have built this without AI agents? Probably. In 6–8 weeks. The velocity difference isn't 2x or 3x — it's closer to an order of magnitude for this type of project, because so much of the work is integration plumbing (WebSocket protocol handling, Docker configuration, deployment scripts, API client wrappers) that an agent handles effortlessly once you give it the right context.

What I Learned

Invest in commands early. The first thing I did on day one was create deployment and querying commands. Every subsequent day, those commands saved time. By day three, I had a development loop where I could say "deploy, test, check results, iterate" in a single conversation. The compound returns are real.

MCP servers change the category of work an agent can do. Without MCP, the agent writes code and runs scripts. With MCP, it operates systems. It can check portfolio value, analyze trade timing, merge positions — all without me writing throwaway scripts or switching to a browser. The agent becomes an operator, not just a programmer.

Keep instructions minimal. Your CLAUDE.md should contain what the agent can't figure out on its own — style conventions, where docs live, project-specific gotchas. Everything else, let it discover by reading code. Overly detailed instructions become stale and misleading.

The human's job shifts from typing to architecture. I wrote maybe 5% of the code in this repo by hand. But I made 100% of the architectural decisions: the queue backlog root cause, the Rust FFI boundary, the WebSocket partitioning strategy, the position-aware arbitrage design. The agent executes faster than I can type. My value is in knowing what to build and why — not in writing the code.

That last point is the one that matters most. AI-native development doesn't make the developer less important. It makes the developer's judgment more important, because the bottleneck is no longer implementation speed — it's architectural clarity. If you know exactly what you want, an AI agent can build it in hours. If you don't, no amount of AI assistance will save you.

The repository is the proof. Not the bot — the repository.