An Order Management System (OMS) isn’t glamorous software work—until you’ve watched a simple “buy” turn into five mismatched records, a delayed fulfillment, and a support ticket that starts with “we’re seeing differences.” Then the OMS suddenly becomes the thing keeping your trading desk, warehouse, customer service, and (yes) your reconciliation team from starting a small fire.
If you’re considering making your own OMS, this guide walks through the practical side: what the system must do, the architectural choices people actually end up making, and how to build it without accidentally creating a second job as a part-time database administrator and part-time detective.
What an OMS does in real trading and fulfillment workflows
A trading-style OMS typically sits between order intake and everything downstream: execution channels, inventory/fulfillment, and settlement-related recordkeeping. Even if you only trade a handful of instruments, the “order” you see in a UI is often a wrapper around multiple internal states.
At a basic level, an OMS handles:
– Order lifecycle: creation, validation, routing, state transitions (new/accepted/partially filled/filled/canceled/rejected).
– Customer and order data: account, instrument, quantities, pricing terms, time-in-force, and tolerances.
– Working orders: orders that remain active and can receive partial fills or modify/cancel events.
– Integrations: broker/exchange feeds, payment/ledger partners, warehouse or fulfillment systems, and risk checks.
– Reconciliation: aligning your internal view with executions, cancellations, refunds, and inventory movements.
In practice, OMS features show up as chores your operations team shouldn’t be doing manually. The OMS becomes the system of record for order state and the translator between “business intent” and “what actually happened.”
Order state isn’t just a status label
Most teams underestimate how much state matters. “Filled” is not always a single event. You may receive multiple partial fills, and later an adjustment or correction. You also need to decide whether you treat certain adjustments as new states, events that amend history, or “late-arriving truths.”
If you don’t model this carefully, your OMS becomes a place where data goes to rot. If you do model it carefully, it becomes boring—in the best way.
Why build your own OMS instead of buying
Buying an OMS can make sense if you have a standard workflow, limited custom execution logic, and budget for implementation plus ongoing vendor costs. Building your own can make sense when your workflows are unusual or you need control over cost, latency, data ownership, or integration depth.
Common reasons teams build:
Unique routing and business rules
Your routing might depend on internal strategy rules, trading hours, specific liquidity selection logic, custom partial-fill behavior, or complex cancel/replace policies. Some off-the-shelf OMS tools can be extended, but you end up paying with time and constraints.
Complex reconciliation and internal positions
If your internal accounting model is distinct—perhaps you manage inventory-like units for derivatives, or you do multi-leg reconciliation—custom systems can keep data consistent without forcing awkward mappings.
Latency and event volume constraints
Real-time feeds can be unforgiving. If you’re processing high-frequency execution updates, you’ll care about event throughput, batching strategy, and data modeling decisions. Your own design can be faster, but only if you’re deliberate.
Data ownership and audit requirements
A custom OMS can be designed to retain full event histories and provide audit trails aligned with your compliance needs, rather than translating from a vendor’s internal format.
What you must decide before writing code
Before you start building screens or queues, make a small list of decisions that will affect everything later. This is where most teams either save time or create chaos.
Define the “order” you manage
Write a short, concrete definition of an order in your world. Is it:
– a customer order (intent),
– an exchange order (execution instruction),
– an internal working order,
– or a composite that spawns multiple routing/execution legs?
Your system should track relationships: parent/child orders, replacement history, and the mapping between intent and executions.
Choose your source of truth model
There are two common patterns:
1) State-based storage: keep the latest snapshot of order state and mutate it.
2) Event-sourced storage: store the event stream and derive the state.
Event sourcing can be powerful for audit and reconciliation because you can replay history. It also introduces complexity in projections and operational tooling. State-based designs are simpler but can be harder to audit if you overwrite important facts.
Most handmade OMS projects land somewhere between: store an immutable event log for the history, while also maintaining “current state” tables for fast reads.
Decide what “idempotent” means in your system
External systems resend messages. Network connections drop. You’ll see duplicate execution reports, repeated cancel acknowledgements, or delayed events arriving out of order. You need a policy for duplicates.
A practical approach is to include a unique external reference (or derive one from message fields) and enforce deduplication at the ingestion layer. Your downstream state machine should accept events idempotently: applying the same event twice shouldn’t change the result.
Pick your consistency strategy
Consistency is not just a buzzword. You need to decide the level of correctness you require at each step:
– UI display correctness (eventually consistent is fine most of the time)
– reconciliation correctness (must match broker/exchange truth)
– inventory correctness (may require stricter guarantees)
You’ll also choose between synchronous checks (slower, simpler to reason about) and asynchronous processing (faster, more careful engineering required).
Core OMS components you’ll end up building
A custom OMS usually decomposes into a few internal services or modules. If you’re small, you might combine them into one service at first. If you later split them, you’ll want clean interfaces.
Order intake and validation
This module accepts new orders (from UI, API, trading strategy, or a partner system). It validates:
– input schema and required fields,
– account/permission,
– symbol/instrument mapping,
– quantity and price formats,
– risk pre-check fields (even if risk checks are separate).
It should also normalize the order into a canonical internal format so downstream systems don’t each invent their own interpretation.
Routing and execution instruction generation
Routing determines where an order goes: which broker, which venue, which execution type, and whether you break orders into multiple legs.
Even if you use a single broker, routing still matters because you may choose different time-in-force, order types (limit/market/stop), handling of partial fills, and cancel/replace sequences.
External integrations layer
You’ll deal with at least:
– order acknowledgements from counterparties,
– execution reports or trade confirmations,
– cancels and replace acknowledgements,
– account/position updates (sometimes separate),
– and in some cases corporate actions or reference data updates.
This layer must translate external message formats into your internal event types and store enough data to reconcile later.
Order state machine (the part that prevents messes)
If there’s a “center of gravity” in the OMS, it’s the order state machine. It defines:
– which transitions are valid,
– how partial fills update state,
– how cancellations behave for working vs partially filled orders,
– how late-arriving events are handled.
Create the state machine early, test it heavily, and treat it like production math: boring, correct, and defended by tests.
Position and inventory coupling
Depending on your product type, the OMS may update internal positions when fills occur. If you’re trading instruments without physical inventory, “positions” are accounting objects. If you’re trading physical goods, you’ll also affect warehouse stock and orders to pick/pack/ship.
You can either:
– update positions synchronously when fills are confirmed, or
– publish “fill events” and let a portfolio/accounting service consume them.
Both patterns work; the main difference is how you handle failure and reconciliation if one side lags.
Reconciliation and audit reporting
Reconciliation is where many OMS projects get stuck, usually because the initial build ignored it. Your OMS needs a way to answer questions like:
– Did every accepted order get a corresponding execution/cancel path?
– Are internal fill records identical to broker confirmations?
– Were there missing messages during outages?
– What’s the difference between “expected” and “actual” for each day or batch?
This part usually becomes a set of reconciliation jobs plus operational dashboards for differences.
Data model: the boring foundation that matters
Your database schema will determine whether your OMS is easy to query or a misery museum.
Separate immutable events from mutable state
A practical model often includes:
– events table (immutable): ingestion events, state transitions, external confirmations, internal actions.
– orders table (mutable snapshot): current state, last known quantities, timestamps.
– fills (or executions) table: each trade/fill record with unique IDs and references.
– relationships: mapping parent order to child orders, replacements, and legs.
– idempotency_keys: records used to deduplicate incoming messages and avoid double-processing.
You don’t need perfect normalization everywhere, but keep the domain boundaries clear so reconciliation queries are possible without joining 27 tables and praying.
Use unique identifiers consistently
You need unique IDs for:
– internal order IDs,
– external broker order IDs,
– internal execution/fill IDs,
– external execution/trade IDs,
– and any grouping IDs for strategy sessions or batch orders.
One of the most common OMS bugs is accidental collision or inconsistent mapping—like treating broker_order_id A as equivalent to broker_order_id B after a replace cycle. Decide early: which ID is canonical at each stage?
Store timestamps with intent
Store multiple time fields:
– ingestion time (when you received the message),
– event time (when the external system says it happened),
– processing time (when your OMS handled it),
– and domain times (like trade date, effective settlement date).
Then don’t “eyeball” reconciliation. Use the fields you stored.
Designing the order state machine
This section can save you months later. The safest OMS designs treat state transitions as rules, not ad hoc updates.
Example state set for a typical limit order
Your states might include:
– NEW (internal created, not acknowledged)
– SENT (instruction sent to broker)
– ACKNOWLEDGED (broker accepted; working)
– PARTIALLY_FILLED (one or more fills received, remaining quantity working)
– FILLED (fully filled)
– CANCEL_SENT
– CANCELLED
– REJECTED
– EXPIRED (if time-in-force ends)
– REPLACED or maintain replacement history
You might not need all of these on day one, but the underlying concepts—acknowledgement, partial fill, cancel/replace—almost always appear.
Transition rules you should write down
Write down rules such as:
– A fill event can only reduce remaining quantity; it should never increase it.
– A cancel acknowledgement can apply only if the order is still working (or if partial fill already happened, it should correctly reflect remaining quantity behavior).
– Late events can arrive after cancellation; your reconciliation logic must treat them consistently: either they’re ignored if they violate order lifecycle, or they produce an “exception status” for investigation.
Then implement state transitions as pure functions where possible—input: current state + event; output: new state + side effects. This makes testing less painful.
Handling out-of-order events
You’ll eventually see this. Broker feed messages may arrive out of order due to network delays.
A workable strategy:
– Keep an “event ordering” attribute when provided.
– If not provided, rely on a combination of external IDs plus quantity deltas to enforce consistency.
– For events that conflict with current known state, flag an exception and record it rather than silently rewriting history.
Silence is where reconciliation nightmares are born.
Routing and order instruction handling
Routing isn’t just “send to broker X.” It’s the policy for what you instruct the broker to do, and how you react to acknowledgements and fills.
Order types and time-in-force
Your OMS should model:
– limit vs market vs stop (or other types your strategy uses),
– time-in-force (day, GTC, IOC, FOK depending on your region),
– handling for partial fills (if you want to cancel remaining after first fill, that’s a policy).
Many brokers treat these differently; your OMS needs to translate your intent into broker instructions and then interpret execution results correctly.
Cancel/replace strategy
Cancel/replace is common when strategy updates price. But it’s also where you can accidentally overtrade.
A safe policy includes:
– tracking replacement lineage (old order vs new order),
– preventing multiple replaces in a short window without acknowledgement (unless you’re intentionally doing staged replacements),
– and verifying that replacement acknowledgements reference the expected prior order.
Also: if cancel/replace fails, your OMS must choose whether to retry, downgrade, or mark for exception.
Integrations: feeds, webhooks, and the “message plumbing” problem
Most custom OMS pain isn’t in the business logic—it’s in the integration plumbing. You will write adapters, and you will debug them at 2 a.m. because of one unexpected field change. That’s just the job.
Ingestion pipeline
A clean ingestion pipeline typically does:
1) Receive message (HTTP/WebSocket/stream).
2) Validate format and signatures (if applicable).
3) Normalize into internal event structure.
4) Deduplicate using idempotency keys.
5) Persist event to event log.
6) Publish internal event to state machine and downstream consumers.
The important bit is steps 4 and 5: if you don’t persist raw events reliably before processing, you’ll lose audit trails when something breaks.
Retries and backoff
When downstream step fails (e.g., broker API temporarily unreachable), you need retries. But don’t retry blindly.
– Use idempotency keys so retries do not create duplicate internal actions.
– Add backoff and circuit breakers for external calls.
– Track retry counts and mark events for manual review after thresholds.
Schema versioning for external messages
If you depend on broker message formats, plan for schema drift. Keep mappings versioned. When a field changes meaning or name, you want to handle it predictably rather than breaking production.
Risk checks: do you put them in the OMS or beside it?
Risk is often adjacent to OMS, but you have to decide where it lives. If you embed risk decisions into OMS, you centralize enforcement. If you separate risk as its own service, you keep OMS simpler and risk logic independently testable.
A practical compromise:
– OMS handles order-level validation and basic constraints,
– risk service handles portfolio limits, margin checks, and strategy constraints,
– OMS calls risk service (synchronously or near-synchronously) for orders that require enforcement.
If your risk checks are synchronous, latency increases. If asynchronous, you need a mechanism to “accept order” and later block or cancel it—which can complicate states and reconciliation.
Payments and settlement: keep expectations realistic
Some builders include payments in their OMS. Others keep it separate and only record intent. If you trade instruments where “payment” is effectively handled elsewhere, you still may have to manage settlement confirmations or ledger effects.
Whatever you do, separate concerns:
– execution events from broker/exchange,
– trade accounting in your ledger/portfolio system,
– settlement events from clearing/settlement partners.
Trying to force all settlement correctness into OMS tends to turn a clean system into a distributed monolith. It can be done, but it usually isn’t worth the headache for early versions.
Build vs buy: when a custom OMS still benefits from product components
Even if you build the core OMS, you may still use existing components:
– message broker (Kafka-like approach),
– database replication and backups,
– metrics instrumentation,
– authentication libraries,
– and UI component frameworks.
You don’t lose points for using tools. You lose points when you pretend those tools remove the need for domain correctness.
Security and operational hygiene
OMS systems are target-rich environments. They hold credentials, account mapping tables, and trade activity. Treat them like production money software, because that’s what it is.
Authentication and authorization
You need:
– role-based access to order management endpoints,
– audit logs for manual overrides,
– and strong API authentication for partners.
If there’s any admin UI allowing manual cancel/replace, add thorough audit and require operator identity.
Audit trails
Audit trails should answer:
– who initiated an action,
– what data they used,
– what external requests were made,
– and what internal events resulted.
A common failure mode: teams log “operator pressed button,” but not the exact payload. When reconciliation fails, you need payload-level details.
Secrets management
Keep broker API credentials in a secrets manager, not environment variables checked into config repos. This matters less for a demo and a lot for a system that’s expected to run without heroics.
Testing an OMS: what to test first
Testing is the part you do early if you like sleeping.
Unit tests for state machine transitions
Write unit tests that feed event sequences into the state machine and assert:
– final state,
– quantities filled/remaining,
– transition validity,
– and exception flags for conflicting events.
This catches logic bugs fast.
Integration tests for broker adapters
Simulate external messages from brokers/exchanges and validate:
– normalization correctness,
– deduplication behavior,
– mapping to internal IDs,
– and correct persistence of raw events.
A good adapter test suite often prevents “one field changed” emergencies.
Replay tests using recorded feed data
Once you have recorded messages from a real environment, replay them against a staging OMS. This helps validate that exception handling and order transitions behave as expected over messy real-world event streams.
Load tests that reflect event patterns
OMS load isn’t just about CPU. It’s about:
– event ingestion throughput,
– database write performance for event logs,
– and queue lag under bursts (e.g., market open).
Test for bursts and recovery after external outages.
Deployment strategy: keep it boring in production
OMS systems need reliable deploys. A bad deploy can cause state mismatches or duplicate actions.
Idempotent processing for safe rollouts
If your ingestion and processing are idempotent, you can redeploy or reprocess safely. If they aren’t, redeploys become scary.
This leads back to deduplication and immutable event logs. They aren’t just nice-to-haves.
Backfills without breaking history
When you correct mappings or fix a bug in projections, you’ll want to backfill derived state. Use separate projection processes and track versioning of projections.
Avoid rewriting immutable event logs unless you have a clear governance plan.
Monitoring and operational dashboards
An OMS is only “working” if you can tell when it isn’t.
Metrics that matter
Look for:
– ingestion rate and ingestion errors,
– processing lag (time from event ingestion to state update),
– counts of exceptions and rejected events,
– broker API error rates and timeout counts,
– reconciliation mismatch counts.
Your dashboards should make it obvious whether you have a logic issue, an integration failure, or a data problem.
Operational runbooks
Write down what to do when exceptions happen. For example:
– If you detect order stuck in ACKNOWLEDGED with no fills after a defined time window, who checks what?
– If you detect reconciliation mismatch for a specific order, how do you confirm whether it’s a missing event vs a mapping issue?
Runbooks also keep you from relying on memory during stressful incidents.
Reconciliation: the part that turns “pretty orders” into “trusted truth”
If you build your own OMS, reconciliation is where you prove it works.
Define reconciliation granularity
Do you reconcile:
– by order ID,
– by execution fill ID,
– by account and trade date,
– or by batch/trading session?
Most systems do multiple layers. Start with fill-level reconciliation because that’s the hardest part to correct later.
Your reconciliation queries should be explainable
A reconciliation report should answer the “why” without too much interpretive work. If the report requires reading 12 event types with unclear mapping, you’ll struggle to trust it.
Store enough linkage data: internal vs external IDs, quantities, prices, and status timestamps.
Exception handling and manual review workflow
Not every discrepancy is a lost trade. Sometimes it’s:
– a delayed cancel acknowledgement,
– a correction message,
– or a mapping mismatch between symbols/instruments.
So your OMS needs an exception status and a way to gather evidence for manual review, including raw events and the external message references.
Small-scope OMS projects that don’t drown you
Building everything at once is a common failure mode. A better approach is incremental scope with a stable architecture beneath it.
MVP scope that still teaches you the real lessons
A reasonable MVP often includes:
– order intake + persistence,
– integration with a single broker endpoint (or sandbox),
– state machine for basic lifecycle (ack, partial fills, filled, canceled),
– event log with deduplication,
– and reconciliation against executions.
You can postpone fancy routing and risk checks. But don’t postpone state correctness.
Stage risk, routing, and settlement after you can reconcile executions
If your system cannot reconcile fills, it doesn’t matter how pretty the UI looks. Start with execution truth, then expand.
Common failure modes in custom OMS builds
Here are the issues that show up across projects, regardless of language or vendor.
Overwriting state instead of preserving history
If you store only the latest snapshot and forget the raw events, reconciliation becomes a guessing game. Preserve raw message data and the event timeline.
Buying time with “best guess” id mapping
If you map symbols or order IDs using heuristics, you will eventually break during edge cases: corporate actions, symbol changes, replacement cycles, or partial cancels. Use deterministic mappings and design for corrections.
No strategy for duplicates and retries
External systems resend messages. Internal services retry requests. If your system doesn’t enforce idempotency, you’ll double-send and create real operational damage.
State machine treated like a set of if-statements
It will work once, then fail on the second weird event. State machines should be explicit and testable.
Reconciliation bolted on late
Reconciliation isn’t a feature you add at the end. It’s a requirement that informs data model choices from day one.
Example architecture for a DIY OMS (reference blueprint)
This is not the only way, but it reflects what tends to work for teams building from scratch.
Recommended internal modules
– Order Service: intake, validation, creates internal order and parent/child relationships.
– Routing/Execution Adapter: translates internal orders to broker instructions; handles instruction acknowledgements.
– Event Ingestion Service: receives external messages, normalizes, deduplicates, persists event log.
– State Machine Processor: consumes event log events and updates order state snapshots.
– Fill/Execution Processor: stores execution records and updates positions/trade accounting triggers.
– Reconciliation Service: compares internal records to external truth and creates exception records.
If you’re early, you can combine these into one service. But keep interfaces clean so you can split later without rewriting everything.
Data flow pattern
Order intake starts at Order Service. External messages arrive at Event Ingestion. Both write to persistent storage (event log first). State machine updates snapshots. Reconciliation reads from both internal and stored external messages.
This pattern tends to reduce “what the system thought happened” versus “what actually happened” gaps.
Operational playbook: what you should be able to answer
A good OMS should let you answer questions quickly during normal operations and during incidents.
You should be able to query:
– For a given internal order ID, what broker order ID(s) were used?
– What events occurred in what order?
– How many fills were received and at what timestamps?
– Did a cancel occur before or after a partial fill?
– If something is off, what evidence exists (raw external messages, exception flags)?
If those answers require deep engineering involvement every time, your OMS is not done. You’ve built a data store with a UI, not an operational system.
Cost and staffing considerations
Custom OMS builds often look cheap at the start and expensive later, because the real work appears after you handle the first edge case.
Engineering time sinks
– State machine correctness and testing
– Broker/exchange integration quirks
– Deduplication and event ordering
– Reconciliation tooling and exception workflows
– Operational monitoring and alerting
– Secure operations (credentials, audits)
If you’re a small team, the “hidden” time is often in production support and reconciliation logic improvements. Budget for that.
Trading and compliance time sink
If you’re under regulatory requirements, you’ll also spend time on audit logs, retention policies, and evidence generation. OMS state and event history becomes your proof kit.
This is not the part where you want to move fast and break things. The market doesn’t care that you were busy.
Frequently overlooked design questions
These questions show up during the build reviews that everyone hopes they can skip.
How long do you retain event logs?
At least through reconciliation windows and audit requirements. If you might need to reconstruct history months later, retention needs to be planned now.
Do you support order modifications beyond cancel/replace?
Some brokers let you amend certain fields, others require cancel/replace. Your OMS should either support the subset you can reconcile or standardize modifications as cancel/replace internally.
How do you handle corporate actions and instrument changes?
If your instruments can change (symbol mapping, splits, dividends), your OMS must define whether it stores normalized instrument data at order time, and how that affects reconciliation.
What’s your policy for “stuck” orders?
Define timeouts and exception criteria. If a working order doesn’t get updates, do you query broker status? Do you create an exception? What actions are safe?
A realistic roadmap for building your own OMS
Here’s a practical sequence that keeps you from building the wrong thing first.
Phase 1: Execution-truth foundation
– Persist orders and external messages.
– Build the state machine for basic lifecycle.
– Store fills/executions with correct IDs.
– Provide minimal reconciliation report comparing internal fills to external confirmations.
Phase 2: Operational correctness
– Add strong idempotency and retry handling.
– Expand exception handling and manual review workflow.
– Add monitoring for ingestion errors, processing lag, and reconciliation mismatches.
Phase 3: Routing and scaling
– Add routing rules and cancel/replace policies.
– Improve event throughput and database performance.
– Add backfill/replay tooling for projections.
Phase 4: Risk, positions, and settlement coupling
– Add risk checks to order intake and/or before execution.
– Integrate with portfolio/position accounting.
– Add settlement or ledger reconciliation if required by your workflow.
The biggest advantage of this sequencing is that you prove correctness early, when debugging is still manageable.
When to stop building and reassess
Since you’re deliberately building, it helps to know the signals for “maybe it’s time to reassess”:
– You’re spending more time fixing integration edge cases than improving state correctness.
– Reconciliation reporting keeps needing manual engineering hand-holding.
– Your team grows, but onboarding new engineers to the OMS takes unusually long.
– You can’t confidently answer “what happened” for any order within a reasonable time window.
Those are not signs you failed. They’re signs you should evaluate whether the remaining features are worth your engineering cost versus adopting an existing platform for parts that aren’t your differentiator.
Summary: build trust first, features second
Making your own OMS is mostly a discipline exercise: define the domain model, handle messy events, preserve audit history, and enforce a clear state machine. Everything else—UI, fancy routing, deeper risk integration—comes after you can reliably reconstruct what happened to any order and reconcile executions against external truth.
If you do it right, the OMS becomes the boring middle layer that makes your system trustworthy. And honestly, in finance systems, boring is a compliment.