How Crypto Exchange Matching Engines Actually Work
The Piece of Software That Makes or Breaks Your Exchange
Every exchange founder wants to talk about the UI. The slick TradingView charts, the dark mode toggle, the mobile app. And sure, those things matter for user acquisition. But if you want to know what actually determines whether your exchange survives its first real trading day, look at the matching engine.
The matching engine is the core piece of software that takes incoming buy and sell orders, figures out which ones can be filled against each other, and executes trades. That sounds simple. It’s not. A poorly designed matching engine will silently drop orders under load, create phantom fills, produce incorrect balances, and generate the kind of accounting nightmares that make auditors quit. I’ve debugged enough of these to know the pain firsthand.
What surprises people is how deceptively small the core logic appears. The basic algorithm for matching a buy order against existing sell orders can be written in under 100 lines of code. But the difference between that toy implementation and a production-grade trading engine is about 50,000 lines of edge case handling, persistence logic, recovery mechanisms, and performance optimization.
Let me walk through what’s actually happening inside a modern matching engine and why each design decision matters.
What a Matching Engine Actually Does
Forget the textbook definition for a moment. Here’s what a matching engine does in practice, every single millisecond of every trading day:
- Accepts an incoming order — a buy or sell request with a type (limit, market, stop), a price (maybe), a quantity, and a trader ID.
- Validates the order — checks that the price meets tick size requirements, the quantity meets minimum lot size, the trader has sufficient balance, and the order doesn’t violate any exchange rules.
- Attempts to match — scans the opposite side of the order book for compatible resting orders. A buy at $50,000 can match against any sell at $50,000 or lower.
- Executes trades — for each match, it creates a trade record, updates both traders’ balances, and adjusts the order book.
- Handles the remainder — if the incoming order isn’t fully filled, it either rests in the order book (limit orders) or gets cancelled (market orders with no remaining liquidity).
- Publishes events — broadcasts trade executions, order book updates, and balance changes to the rest of the system.
Steps 1 through 6 need to happen atomically. Not “eventually consistent,” not “we’ll reconcile later.” Atomically. If step 4 succeeds but step 6 fails, you’ve got silent fills that the trader never sees. If step 3 succeeds but step 4 partially fails, you’ve got phantom liquidity on the book. Both situations destroy trust faster than any hack.
This is why the matching engine is almost always a single-threaded, sequential processor for any given trading pair. Concurrency here isn’t your friend — it’s a liability. More on that when we talk about scaling.
Order Book Data Structures: Where Performance Lives or Dies
The order book is the data structure that holds all resting limit orders, organized by price level. It has two sides: bids (buy orders) and asks (sell orders). Bids are sorted highest-to-lowest. Asks are sorted lowest-to-highest. The best bid and best ask — the prices at the top of each side — define the spread.
Choosing the right data structure for this is one of the most consequential architecture decisions you’ll make. Here are the real options:
Sorted Arrays or Linked Lists
The simplest approach. You maintain a sorted list of price levels, and each level contains a queue of orders.
Pros: Dead simple to implement. Cache-friendly memory layout. Fast iteration when you need to walk the book.
Cons: Insertion is O(n) where n is the number of price levels. For a pair like BTC/USDT with thousands of active price levels, this gets slow. Deletion is also O(n) due to shifting.
Verdict: Fine for a prototype. Unacceptable for production with serious volume.
Balanced Binary Search Trees (Red-Black Trees, AVL Trees)
This is what most production matching engines use. A self-balancing BST gives you O(log n) insertion, deletion, and lookup. You can also efficiently find the best bid or best ask by maintaining pointers to the min and max nodes.
Pros: O(log n) for all critical operations. Well-understood algorithms. Easy to maintain best-bid and best-ask pointers.
Cons: Pointer-heavy structures can cause cache misses. Tree rotations during rebalancing add constant-factor overhead.
Verdict: The industry standard for good reason. This is what you should use unless you have very specific requirements that push you elsewhere.
Hash Maps with Skip Lists
Some high-frequency trading shops use a hash map keyed by price level combined with a skip list or similar structure for ordering. This gives you O(1) lookup for a specific price level and O(log n) for ordered traversal.
Pros: Blazing fast for the common case of adding an order to an existing price level.
Cons: More complex to implement correctly. The skip list component still has O(log n) insertion for new price levels.
Verdict: Worth considering if your profiling shows that most orders land on existing price levels (which is actually true for heavily traded pairs).
The Real-World Hybrid
Here’s what most serious implementations actually look like: a balanced BST for the price-level index, with each price level containing a doubly-linked list (FIFO queue) of individual orders. You also maintain a hash map from order ID to order node, so cancellations are O(1) lookup plus O(1) removal from the linked list.
That hash map for order cancellation is crucial. On any active exchange, cancellation volume is typically 10-50x the trade execution volume. Market makers are constantly placing and cancelling orders. If your cancellation path is slow, your entire engine bogs down.
Price-Time Priority: The Fairness Contract
Price-time priority, also called FIFO matching, is the standard algorithm used by virtually every legitimate exchange. The rule is simple:
- Orders are first prioritized by price. A buy order at $50,001 gets filled before a buy at $50,000.
- At the same price level, orders are prioritized by time. The order that arrived first gets filled first.
This sounds obvious, but it has profound implications. Price-time priority is fundamentally a fairness guarantee. It means that if you placed your order first at a given price, nobody can jump the queue by submitting the same price later. Your position in the queue is your property.
Why does this matter so much? Because without strict price-time priority, you open the door to a whole class of manipulative behaviors. An exchange operator could theoretically reorder the queue to favor certain market makers. A trader with a faster connection could cancel and re-place orders to jump ahead. These aren’t hypothetical scenarios — they’re things that have happened on shady exchanges, and they’re exactly why regulators care about matching engine fairness.
Some exchanges have experimented with alternative priority schemes — pro-rata matching (used in some futures markets), where fills at a price level are distributed proportionally across all resting orders, or size-time priority, where larger orders get preference. But for a standard spot trading exchange, price-time FIFO is the standard, and deviating from it requires a very good reason plus clear disclosure to your traders.
The implementation detail that trips people up: your timestamp resolution matters. If your clock resolution is only milliseconds, two orders arriving within the same millisecond get arbitrary ordering. Production engines use microsecond or nanosecond timestamps. And the timestamp needs to be assigned at the moment the order enters the matching engine’s processing queue, not when it hits your API gateway.
Latency and Throughput: What “Fast” Actually Means
People throw around performance numbers for matching engines without context, so let me put some real benchmarks on the table.
Tier 1 exchanges (Binance, Coinbase, Kraken): These engines process individual orders in single-digit microseconds. Their published numbers typically claim 100,000+ orders per second per trading pair, with matching latency under 5 microseconds. At this level, you’re writing in C++ or Rust, using lock-free data structures, pinning threads to CPU cores, and bypassing the kernel’s network stack.
Tier 2 exchanges (mid-size, established): Order processing in the 50-500 microsecond range. Throughput of 10,000-50,000 orders per second. Usually written in Java, Go, or C++. This is where the majority of legitimate exchanges operate, and honestly, it’s more than enough for most use cases.
Tier 3 exchanges (startups, smaller platforms): Processing in the 1-10 millisecond range. Throughput of 1,000-10,000 orders per second. Often written in Python, PHP, Node.js, or Java. Perfectly adequate for an exchange doing under $10M daily volume.
Here’s the thing nobody wants to say out loud: for 95% of new exchanges, Tier 3 performance is completely fine. If your exchange processes 1,000 trades per day — which would be a very healthy launch — you need roughly 0.01 orders per second average throughput. Even accounting for peak bursts at 100x the average, you need maybe 1 order per second. A matching engine written in Python running on a single core can handle that without breaking a sweat.
The performance question isn’t about your launch day. It’s about what happens when you grow. Can your engine scale from 1,000 orders per second to 100,000 without a full rewrite? That’s the architecture question that actually matters.
This is a big part of why choosing proven exchange software matters. The Codono crypto exchange platform ships with a matching engine that’s already been load-tested at production scale, so you’re not gambling on your own implementation surviving its first traffic spike.
Order Types: More Complex Than They Look
A limit order and a market order are straightforward. But the moment you add stop orders, things get interesting, and once you add OCO (one-cancels-other) orders, your state management complexity roughly doubles.
Limit Orders
The bread and butter. “Buy 1 BTC at $49,500 or better.” The order rests on the book until it’s filled, cancelled, or expires. Implementation is straightforward — validate, attempt to match against the opposite side, rest any unfilled quantity.
Market Orders
“Buy 1 BTC at whatever the current price is.” The order walks the opposite side of the book, eating through price levels until it’s fully filled or there’s no more liquidity. The critical implementation detail: you need a price protection mechanism. A market buy in a thin book shouldn’t fill at $999,999 because some joker placed a sell order there. Most engines implement a maximum slippage parameter or an implicit limit price based on a percentage deviation from the last traded price.
Stop-Loss and Stop-Limit Orders
This is where matching engines get genuinely tricky. A stop order is a conditional order: “When the market price reaches $48,000, submit a market sell order.” The stop order itself doesn’t sit on the visible order book. It sits in a separate data structure — the stop book — and is triggered when a trade executes at or through the stop price.
The tricky part is cascading stops. If a trade at $48,000 triggers a stop sell, and that stop sell’s execution pushes the price down to $47,500, which triggers more stops, which push the price down further… you can get a cascade that drains the order book in milliseconds. This is actually what happened during several historical flash crashes. Your engine needs circuit breakers — maximum price movement per time window, minimum order book depth checks, and the ability to pause matching if things go haywire.
OCO (One-Cancels-Other) Orders
An OCO links two orders together: typically a take-profit limit order and a stop-loss. When either one fills, the other is automatically cancelled. This requires maintaining a relationship graph between orders, and the cancellation must happen atomically with the fill. If the stop-loss triggers and fills, but the take-profit cancellation is delayed by even a few milliseconds, you can end up with both orders filling — giving the trader a position they didn’t want.
Codono’s engine handles these advanced order types natively, including support for futures trading order types like reduce-only and post-only orders that add yet another layer of complexity.
Edge Cases That Will Wreck Your Exchange
This is the section that separates engineers who’ve shipped matching engines from those who’ve read about them. Here are the edge cases that have caused real production incidents:
Self-Trade Prevention (STP)
What happens when a trader has a resting sell order at $50,000 and submits a buy order at $50,000? Without STP, the trader trades with themselves — paying fees on both sides for no economic purpose. Market makers do this accidentally all the time when their algorithms have multiple strategies running simultaneously.
You need an STP policy. The three common options:
- Cancel newest: The incoming order that would self-trade is cancelled.
- Cancel oldest: The resting order is cancelled, and the incoming order continues matching.
- Cancel both: Both orders are cancelled.
Most exchanges default to cancel-newest, but you should make it configurable per trader. Market makers have strong opinions about this.
Dust Orders
What do you do with an order for 0.00000001 BTC? Or a fill that results in a remainder of 0.000000003 BTC? These dust amounts are too small to trade, too small to withdraw, but they clog up your order book and database.
You need minimum order quantities, minimum notional values (the price times the quantity must exceed some threshold, typically a few dollars), and a dust collection mechanism that periodically sweeps sub-minimum balances.
Tick Size and Lot Size Enforcement
Prices must be in multiples of the tick size (say, $0.01 for BTC/USDT). Quantities must be in multiples of the lot size (say, 0.00001 BTC). If you don’t enforce these at the matching engine level, you get orders at prices like $50,000.003 that don’t align with any other orders and sit on the book forever. Worse, you get decimal precision issues that cause balance discrepancies.
Overflow and Precision
This one has bitten more exchanges than any other. If you’re using floating-point arithmetic for prices and quantities, stop. Right now. Floating-point math produces rounding errors that, across millions of trades, compound into real balance discrepancies. A production matching engine must use either fixed-point integer arithmetic (representing $50,000.00 as the integer 5000000) or arbitrary-precision decimal libraries. There is no third option.
Post-Matching Balance Checks
Even with pre-trade balance validation, you need post-trade balance assertions. After every trade execution, verify that the sum of all user balances plus exchange fees equals the expected total. If this invariant breaks, halt matching and investigate. This is your last line of defense against bugs that would otherwise slowly drain user funds.
Scaling Strategies: The Single-Threaded Dilemma
Remember when I said matching engines are almost always single-threaded per trading pair? This creates an obvious scaling challenge. A single thread on modern hardware can handle maybe 100,000-500,000 simple operations per second. For most exchanges, that’s fine. But what if it’s not?
Vertical Scaling
Throw faster hardware at it. Higher clock speed CPUs, more L3 cache, faster memory. This is the first thing to try and it’s often sufficient. Modern high-frequency trading engines run on custom hardware with FPGA-based network cards that bypass the CPU entirely for basic matching operations. But for a typical exchange, a beefy server with a 5+ GHz processor and 128GB of RAM will handle an enormous amount of volume.
Horizontal Scaling by Trading Pair
The simplest horizontal scaling strategy is to run one matching engine instance per trading pair. BTC/USDT gets its own engine. ETH/USDT gets its own engine. They’re completely independent — no shared state, no coordination needed. This scales linearly with the number of trading pairs.
The challenge is balance management. If a trader has 10 BTC and places orders across 5 BTC trading pairs, each matching engine needs to know the available balance. The standard approach is a centralized balance service that each engine checks before accepting an order. This balance service becomes your new bottleneck, but it’s much simpler to scale because it’s doing simple read-write operations, not complex matching logic.
Sharding Within a Trading Pair
For the very highest volume pairs, you might need to shard the order book itself. One approach is to shard by price range — one engine handles prices from $49,000-$50,000, another handles $50,000-$51,000. But cross-shard matching (a market order that crosses price ranges) is fiendishly complex.
Honestly? Unless you’re processing Binance-level volume, you don’t need this. And if you are processing Binance-level volume, you’ve got bigger architectural concerns to deal with first.
For most exchange operators, the smart play is to use liquidity aggregation to supplement your order book depth rather than trying to shard a single pair’s matching engine.
Memory vs. Disk: The Persistence Question
A matching engine’s order book lives in memory. It has to — disk I/O adds milliseconds of latency that are completely unacceptable for real-time matching. But what happens when the engine crashes? You can’t just lose all resting orders.
The standard architecture uses three layers:
Layer 1: In-memory order book. This is the hot path. All matching happens here. Zero disk I/O during normal operation.
Layer 2: Write-ahead log (WAL). Every incoming order and every matching event is appended to a sequential log on disk before the engine processes it. If the engine crashes, you replay the log from the last checkpoint to reconstruct the order book. Sequential writes are fast — an NVMe drive can sustain 500,000+ sequential write operations per second, which is more than enough.
Layer 3: Periodic snapshots. Every N seconds (or N events), the engine writes a complete snapshot of the order book to disk. This limits how far back you need to replay the WAL after a crash. Without snapshots, a crash after 24 hours of operation might require replaying millions of events.
The WAL is the critical piece. It gives you durability without sacrificing latency. The trade-off is that you’re writing to disk asynchronously — there’s a tiny window (microseconds) between when a trade is matched in memory and when it’s persisted to disk. If the machine loses power in that exact window, you lose those events. For most exchanges, this risk is acceptable. For those that need absolute zero-loss guarantees, you add synchronous replication to a standby machine before acknowledging the trade.
How Codono Handles All of This
Building a matching engine from scratch means solving every one of the problems described above. The data structures, the priority logic, the edge cases, the persistence, the scaling — all of it. And then maintaining it forever, because markets evolve and new order types emerge.
This is exactly why we built Codono’s trading engine the way we did. The Codono exchange software includes a production-grade matching engine that handles:
- Price-time priority FIFO matching with microsecond timestamp resolution
- All standard order types including limit, market, stop-loss, stop-limit, and OCO
- Self-trade prevention with configurable policies per API key
- Fixed-point arithmetic throughout the entire trade pipeline — no floating-point anywhere near balance calculations
- Write-ahead logging with automatic checkpoint and recovery
- Horizontal scaling with independent engine instances per trading pair
- Built-in circuit breakers for flash crash protection
The engine also integrates directly with our liquidity engine, so even if your organic order book is thin, incoming orders can match against aggregated external liquidity. This is particularly important for new exchanges where building organic liquidity takes time.
For operators who need programmatic access, the API layer provides low-latency WebSocket feeds for order book updates and trade execution reports, so market makers can interact with the engine efficiently.
The Bottom Line
A matching engine is deceptively simple in concept and brutally difficult in execution. The core algorithm fits in your head. But the production concerns — persistence, recovery, fairness, precision, edge cases, performance under load — represent years of engineering effort.
My honest advice: unless matching engine development is your core competency and competitive advantage, don’t build one from scratch. Use battle-tested software, and focus your engineering resources on the things that actually differentiate your exchange — the user experience, the coin listings, the regulatory compliance, the marketing. The matching engine should be a solved problem that you deploy, not a science project that consumes your runway.
The exchanges that succeed aren’t the ones with the fanciest matching engines. They’re the ones that made smart build-vs-buy decisions, launched quickly, and iterated on what matters to their users. Your matching engine needs to be correct, fast enough, and reliable. It doesn’t need to be a work of art. It needs to work.