Backtesting Software

Backtesting turns vague ideas about “what might work” into something you can measure. The trick is that most people don’t need more finance theory—they need a repeatable system that tests strategies the same way every time. Building your own backtesting software is one of those projects that sounds bigger than it is, until you realize which parts you keep messing up. The good news: once you get the structure right, everything gets easier, including debugging and future improvements.

This article walks through how to design and implement backtesting software from a practical angle: data handling, order simulation, execution assumptions, metrics, and how to avoid the classic traps that cause backtests to lie.

What “Backtesting Software” Actually Means

Backtesting is not “run a strategy and plot an equity curve.” A backtester is closer to a small simulation engine. It takes market data plus a set of trading rules, then produces outputs that must be internally consistent.

At minimum, your software needs:

A data layer to load price/volume and corporate actions (if applicable)
A strategy layer to generate signals and decide orders
An execution layer to turn orders into fills (with slippage/fees)
A portfolio/accounting layer to track cash, positions, P&L, and constraints
A results layer to compute metrics and diagnostics

If any of those parts are sloppy, your strategy can look brilliant while quietly cheating with hindsight.

Different Levels of Backtesting

You can build backtesting tools in layers. A simple implementation tests rules on bar data with end-of-bar fills. A more serious tool simulates intrabar execution, variable spreads, and event-driven order book behavior.

Here’s a useful way to think about it:

Bar close backtest: signals on bar close; orders fill at next bar open (or close)
Bar-to-bar intraday: signals derived from within-bar fields; fills use next available timestamp
Event-driven: timestamped ticks/events; order book or simplified fill model

You can ship something useful quickly with bar close. But as soon as you trade fast or depend on price levels inside the bar, you’ll need more realism.

Decide the Scope Before You Write Code

The scope decides everything: your data requirements, your simulation precision, and how much time you’ll burn debugging edge cases.

Pick the Markets and Instruments

Are you backtesting equities, crypto, futures, FX? Each has practical quirks:

Equities: dividends, splits, borrow rates (if shorting), trading halts
Futures: roll dates and contract switches
FX: bid/ask and time-zone specifics
Crypto: often continuous but still needs exchange quirks (heavily depends on your data source)

Even if you don’t model corporate actions perfectly at first, decide how you’ll handle them. “Ignore it” can work for some testing windows, then explode unexpectedly later.

Pick Frequencies and Time Zones

Backtesting is full of off-by-one mistakes. The simplest safe rule is: store timestamps in UTC internally. Then convert to local market time only for display and sessions.

Bar size matters too:

Daily: easier accounting, slower execution assumptions
Hourly: more noise, more missing bars, more market microstructure crud
Minute/tick: your performance budget matters, and “signal generation timing” becomes critical

Set an Execution Model (And Document It)

Without an explicit execution model, your backtester becomes a story generator. Decide what your engine assumes:

Order type: market, limit, stop (or only market to start)
Fill timing: at next bar open, within-bar, or at timestamp match
Fill price: use mid, bid, ask, last, VWAP, or a custom function
Slippage: fixed basis points, spread-based, or size-based
Fees: per trade, per share, per notional, plus exchange rebates if you want to be fancy

Write those assumptions down where you can find them later. Your future self will pretend they forgot.

Architecture: A Backtester That Doesn’t Turn Into Spaghetti

You want modules that can be swapped without rewriting everything. The classic trap is building a single script that loads data, runs signals, simulates fills, and plots results—then you can’t change anything without breaking five other things.

A clean approach is a simple layered architecture.

Core Modules

A practical module breakdown:

DataLoader: reads data, returns bars/ticks in chronological order
IndicatorEngine: computes moving averages, RSI, etc., on a rolling window
Strategy: receives market updates, outputs intents (buy/sell, sizes, order types)
ExecutionSimulator: converts intents into fills based on your model
Portfolio: updates positions/cash from fills, enforces constraints
MetricsReporter: computes statistics and generates reports

You can implement this in many languages. The important part is separation of concerns.

Event Loop: The Heart of the System

Whether you backtest bars or ticks, you usually run a loop that consumes market events and lets the other modules react.

A bar-driven loop often looks like:

Load next bar/timestamp
Update indicator state
Call strategy to generate orders for the next execution point
Simulate fills at the chosen timing
Update portfolio
Record outputs for metrics

If you’re time-limited and want minimal complexity, start with this bar-driven loop and a single execution rule: market orders fill at next bar open with slippage and fees.

Data Handling: The Part That Causes Most Headaches

Backtesting lives and dies on data quality. It’s common to “successfully backtest” a strategy that trades ghosts because of timestamp mismatches, missing bars, or corporate action surprises.

Data Format and Storage

You’ll likely ingest data from CSV, a database, or an API, then store it locally for fast repeated runs.

For performance and sanity, store data in a columnar or indexed format if possible. Your program will repeatedly scan time ranges and need speed.

At the data level, you want consistent columns depending on asset class and frequency. For bar-based backtests, a typical schema is:

timestamp (UTC)
open, high, low, close
volume (if available)
vwap (if you have it and want it)

For intraday, you also need awareness of trading sessions and missing data. Crypto markets are easier for continuity, but exchanges still have gaps.

Aligning Data Feeds

If you trade multiple instruments or use benchmark data (like SPY, or a risk-free rate proxy), you must align timestamps. Decide on one of these approaches:

Strict alignment: require all instruments on each time step—drops data otherwise
Last known price: forward-fill for missing bars—can create stale price artifacts
Event-driven per instrument: simulation steps depend on actual events per symbol

Strict alignment is more honest but often reduces sample size. Forward-filling is sometimes acceptable for some use cases, but you should label it loudly in your reports.

Corporate Actions and Continuity

Splits and dividends distort historical prices. If you’re using adjusted prices, be consistent: do you adjust to total return, or do you simulate dividends explicitly?

Two common approaches:

Use adjusted OHLC: prices already reflect splits/dividends
Use raw OHLC plus corporate actions: you simulate dividend payments and adjust holdings

Pick one at the start. Mixing adjusted prices with dividend simulation is a reliable way to “lose money in finance, win money in code,” which sounds funny until it’s your portfolio.

Strategy Interface: Make It Boring and Repeatable

Your strategy module should be easy to read and easy to test. A strategy typically:

computes signals using indicators and context
decides order intent: side, quantity, order type, and maybe price offset
submits orders to the execution simulator

Timing: Signal vs Execution

This is where backtests cheat accidentally. For bar-driven systems:

When does your strategy “see” the bar?
When are orders filled?
Are you using close-to-close returns but filling at close?

A good rule: strategy decisions at time t should only use data available up to time t. Then orders are filled at time t + Δ based on your execution model.

If you allow strategy to “see” the bar close and then fill at the same close, you’re assuming you know the close before it happens. That might be “fine” for daily bars with delayed signals, but you need to be consistent.

Position Sizing Basics

Backtesting software needs to support position sizing logic that can depend on:

fixed shares
fixed notional
percentage of equity
volatility targeting (optional)

Keep sizing logic inside strategy, but keep portfolio accounting inside portfolio. If you mix them, debugging becomes a choose-your-own-adventure novel where every page is an exception.

Order and Fill Simulation: How Trades Actually Happen

Real execution is complicated. Your backtester should be simple enough to run many times, but honest enough to catch major errors.

Start With Market Orders

Market orders are the easiest baseline. Your execution simulator can apply:

mid/bid/ask selection based on side
slippage model
fees
partial fills if you want realism

For bar-based data, a common assumption is:

buy orders fill at next bar open + slippage
sell orders fill at next bar open – slippage

This is not “true,” but it’s at least coherent and reproducible.

Slippage and Spread Models

Slippage can be more than one thing. You might model it as:

fixed number of basis points
proportional to spread
size-based: larger orders pay more

If you don’t have bid/ask data, you can still approximate. The most common mistake is pretending you don’t need spread/slippage because your strategy looks good. That’s how performance metrics quietly evaporate when you go live.

Limit Orders (Optional, But Useful)

Limit orders are where you have to decide how you model whether an order would have been filled within a bar.

Two bar-based approaches:

Touch model: if limit price lies between low and high for the bar, assume fill
Path model: assume an order was hit based on intra-bar order of prices; requires finer data

Without tick-level or at least open-high-low-close with careful ordering assumptions, limit order backtests often become wishful thinking.

If you add limit orders later, keep your assumptions explicit and optionally run sensitivity tests.

Portfolio Accounting: The Math That Must Not Lie

Your portfolio class should be the authority on:

cash balance
positions per symbol
entry price and realized/unrealized P&L (if you track it)
commissions and fees
constraints like leverage, margin, and shorting

Shares vs Contracts vs Notional

For equities and crypto, shares or units are straightforward. For futures, contracts and multipliers matter.

A robust design keeps a symbol metadata record with:

multiplier
minimum tick size (if relevant)
fee structure

Even if you ignore tick/multiplier at first, structure your code so you don’t have to rewrite later.

Realized vs Unrealized P&L

Backtesting reports often confuse these. You typically want:

realized P&L from executed trades
unrealized P&L from current mark-to-market
total equity = cash + marked positions

Make sure your equity curve uses the same marking convention every time (close price, mid price, etc.). If you change the marking method, your metrics also change.

Performance Metrics: Beyond the Equity Curve

Once your backtester runs end-to-end, metrics decide whether you trust the results.

Basic Metrics You’ll Actually Use

A typical reporting set includes:

total return
annualized return (based on time span)
max drawdown
volatility of returns
Sharpe ratio and/or Sortino ratio

Be careful with Sharpe: it depends on whether returns are daily, hourly, or per bar. Your backtester should compute the correct scaling.

Trade-Level Diagnostics

Equity curves can look healthy while trades are terrible. Track:

trade count
average win/loss
win rate
profit factor
holding time distribution (bar count)

These help you figure out whether the strategy works because it catches a regime, or because it takes a very particular set of trades.

Exposure and Turnover

Two strategies with identical returns can have wildly different behavior depending on exposure and transaction cost sensitivity.

Track:

gross exposure over time
position turnover (how often you trade)
average holding duration

Turnover matters because your slippage and fees scale with activity. A backtest that ignores costs usually dies in the parking lot behind your broker.

Avoid the Classic Backtesting Traps

If you build your own backtester, you can accidentally create backtests that work only because your system is too smart. Here are the common traps and how to prevent them.

Look-Ahead Bias

Look-ahead bias happens when the strategy uses data that would not have been known at decision time. Examples:

using the current bar close to decide trade at the same close
including future data in indicator calculations due to incorrect indexing
using resampled data aligned incorrectly to timestamps

Prevention techniques:

make time indexing explicit in indicator calculations
unit test on a small dataset where you can hand-check results
log which bar index each signal uses

Survivorship Bias

If you use current constituents of an index over historical membership periods, you remove bankrupt companies from the dataset. The backtest then looks cleaner than reality.

Prevention:

use full historical survivorship-free datasets if available
or, if not, at least acknowledge limitations

Data Snooping and Overfitting

When you test lots of parameter combinations, you begin to fit noise. Your backtester makes this too easy.

You can’t fully prevent it, but you can reduce it:

reserve a validation period you don’t touch during development
run parameter sweeps on training data only
test on a separate out-of-sample period

A simple but effective practice: run your full pipeline once per strategy variation automatically and store the config used, together with metrics and logs.

Inconsistent Execution Assumptions

Changing fill timing or price conventions between experiments can create apparent improvement that is just a model change.

Prevention:

freeze execution assumptions in config files
include the config hash in your results output

Step-by-Step Build Plan (A Practical Order)

You can build backtesting software without spending six months on it. A useful order is: make it run, then make it correct, then make it fast.

Phase 1: Minimal Backtester That Trades

Start with:

single instrument
bar data only
market orders only
strategy emits buy/sell based on a simple indicator

In this phase, you care about correctness of accounting and timing more than performance.

What to verify:

cash decreases/increases correctly after trades
positions update correctly
equity curve equals cash + market value
no trades occur before enough data exists for indicators

Phase 2: Add Fees and Slippage

Costs often decide whether performance survives contact with reality. Add:

per-trade fee
slippage model

Then rerun the same strategy and check if your performance scales reasonably. If costs cause catastrophic collapse, either the strategy is cost-sensitive (bad sign) or your execution model is inconsistent (a bug).

Phase 3: Multi-Instrument Support

Add symbol loop or event-driven updates. Decide:

same timestamps across symbols, or per-symbol timestamps
portfolio equity calculation across holdings

Then handle trade sizing per symbol with portfolio constraints.

Phase 4: Order Types (Limits/Stops) if Needed

Only after market orders work reliably. Limit orders are often where correctness gets tricky, especially with bar data.

A good intermediate move:

support limit orders using a touch rule (within-bar low/high)
label it clearly and consider sensitivity tests

Phase 5: Performance and Usability

Once it’s correct, make it faster and easier:

optimize indicator computations with rolling windows
cache computed features
parallelize parameter sweeps (careful with shared state)
build a report generator that stores results and logs

Performance improvements matter when you run thousands of backtests for parameter selection and robustness checks.

Implementation Patterns That Reduce Errors

You can dramatically reduce bugs by choosing a few disciplined patterns.

Use Immutable Market Data Objects

Store bar/tick objects as immutable records. Strategy and indicators should read them, not modify them. Accidental mutation creates ghost bugs you can’t reproduce.

Keep Strategy Stateless (Where Possible)

Strategies usually need state, but keep it localized. A rule of thumb:

indicator state lives in indicator engine
strategy state holds only what it truly needs (like last signal)

You can also implement strategies as pure functions over rolling data windows, but that’s harder when signals depend on trade history.

Make Execution Deterministic

Avoid randomness unless you’re doing Monte Carlo simulation explicitly. Determinism makes debugging possible.

If you include random slippage or partial fills, seed your random generator and store the seed in results.

Log Inputs and Decisions at Debug Level

When results look wrong, you’ll want to inspect:

the signal time index used by strategy
the order parameters issued
the bar used for fill price
the exact fill calculation

A standard practice is to produce a detailed trade log with timestamps and prices.

Validation: Prove Your Backtester Isn’t Just Lucky

A backtesting engine should be tested as much as your strategy.

Unit Tests for Accounting

Write tests for portfolio updates:

buy then sell: realized P&L equals expected
partial fills: positions and cash update correctly
fees and slippage: compute exactly the same values you set in config

If you can’t calculate it on paper for a tiny scenario, your code is too magical.

Reconcile Against a Known Reference

If you use a third-party backtesting library (not a requirement, but helpful), you can compare results under identical assumptions. If you don’t want external dependencies, create a tiny synthetic dataset and fully simulate it manually.

Sanity Checks on Outputs

A few checks that catch broken timing:

no trades after the end date
equity never becomes NaN or absurd negative without leverage rules
trade timestamps always occur at or after the decision point
indicator warm-up: first signal happens only after window size

Features Worth Adding Later (But Don’t Ignore Them)

Some features are tempting early because they look like progress. Others are traps. Here are ones that are worth planning but should come in after you have a correct baseline.

Corporate Actions and Dividend Accounting

If you trade equities, add dividend handling when you start using longer time windows. Even a simple cash dividend model helps.

Trading Constraints

Real strategies run into constraints:

max position size
min order size
max leverage
short sale rules

Implement constraints in portfolio or execution layers so strategies can’t accidentally bypass them.

Slippage Sensitivity Runs

Instead of picking one slippage number and praying, run sensitivity tests:

slippage low/medium/high
fee tiers

If the strategy only works under low slippage, it’s not dead, but it’s riding a narrow road.

How to Organize Experiments and Results

You can write the best backtester in the world and still lose track of experiments. Results organization is part of correctness, not just convenience.

Store Configuration With Every Run

Each run should store:

strategy parameters
data source and time range
execution assumptions (fill model, slippage, fees)
indicator settings

A simple config file per run, plus metadata in a results folder, goes a long way.

Keep a Trade Log and a Performance Summary

Your trade log is the forensic report. Your summary provides the quick scan. Don’t mix them.

If your system always outputs:

CSV or JSON trade log
CSV/JSON daily or per-bar portfolio equity history
metrics summary (Sharpe, drawdown, return)

you’ll be able to debug faster later.

Example: Designing a Backtest Engine for a Simple Strategy

Let’s use a straightforward momentum crossover as a running example. The strategy idea:

compute a fast moving average and slow moving average
when fast crosses above slow, go long
when fast crosses below slow, go flat (or short, if you want extra trouble)

In a bar-based backtester:

At each bar close, you compute the crossover based on the latest indicator values
If you decide to trade, you submit an order to be filled at the next bar open

Your execution simulator uses:

fill price = next open price + slippage (for buy) or – slippage (for sell)
apply fees per trade

Your portfolio updates after fills:

buy: reduce cash, increase shares
sell: increase cash, decrease shares
equity each bar: cash + shares * close price (or next open; pick one and stay consistent)

This design keeps timing clean and makes it possible to debug whether your performance differences are due to the strategy or due to your execution model.

Backtesting Pitfalls That Show Up as “Good Results”

Here are a few situations where backtest performance can look great for reasons that won’t survive a live environment.

Overfitting Through Feature Leakage

If you compute features that accidentally use future data (like using the full dataset’s normalization statistics, or misaligned rolling windows), your model trains on information it wouldn’t have.

Fix:

use rolling windows that only depend on past data
fit normalization per training period only
include careful alignment tests

Too-Lucky Entry Timing

A strategy that enters right on breakouts can look incredible if your fill model is optimistic (e.g., filling at the breakout level when the breakout would have happened before you actually placed the order).

Fix:

use realistic fill timing (next bar open in bar-based systems)
optionally add a directional slippage based on recent volatility

Transaction Costs Ignored Until It’s Late

It’s common to develop the strategy without costs, then add costs later. The trap is that the “best” parameter set changes once costs are considered.

Fix:

include costs from the start
treat fee/slippage as part of the experiment assumptions

Performance Tricks: Making It Run Fast Enough

Once you run parameter sweeps, speed matters.

Rolling Indicators Without Recomputing Everything

Indicators should be computed incrementally rather than recalculated from scratch each bar. For moving averages, maintain a rolling sum. For RSI, track rolling gains/losses.

Cache Feature Computations

If your strategy uses the same set of indicators across many parameter combinations, cache indicator outputs where possible.

However, cache carefully. If caching depends on parameters (like window sizes), the cache key must include those parameter values.

Parallel Parameter Sweeps

You can run multiple backtests concurrently if:

each run uses its own instance of the backtest engine
data loading is thread/process safe

Be cautious with shared global state like random seeds, configuration objects, or cached indicator buffers.

What to Output: Reports That Help You Decide

A good backtester produces reports that answer: “Would I trade this?” and “Why is it behaving the way it is?”

Suggested Report Sections

In practice, your report can include:

run config summary (data range, strategy params, execution model)
performance summary metrics
equity curve and drawdown curve (plot files)
trade distribution (wins/losses, holding time)
exposure/turnover summary

Even without fancy visuals, producing a tabular metrics summary helps you compare runs quickly.

Extending to More Realism Without Breaking Everything

You’ll likely start with bar-based trading, then want more granularity. When you extend realism, do it in controlled increments.

From Bars to Intraday Bars

Intraday bars require:

session handling (market open/close)
more careful timestamp alignment
more attention to missing bars

Your execution timing should become more granular as well. Instead of “next bar open,” you may use next available timestamp in your data stream.

From Intraday Bars to Ticks

Tick backtests need event handling and fast data reading. Your order fill model changes significantly.

You also have to decide:

do you simulate order book levels or use simplified bid/ask?
how do you handle gaps and out-of-order ticks?

If you keep your architecture clean, these changes are mostly swapping out the data layer and execution layer, not rewriting your strategy logic.

Common Technology Choices (Without Starting a Flame War)

Language and tooling are personal choices. What matters is correctness, speed, and maintainability.

Python for Prototyping

Python is popular because it’s fast to implement and easier to iterate. For performance, use vectorized calculations where appropriate and keep loops minimal or compiled.

Compiled Languages for Speed

For heavy tick-level workloads, languages like C++ or Rust can help, but complexity rises. Most people don’t need that until their backtester is already correct and fast enough to handle the data.

Use a Backtest Config System

Whatever you use, keep backtest runs driven by configuration files. It makes experiments reproducible and prevents “I changed one thing and forgot what it was” situations.

Minimal Backtesting Template (Conceptual, Not Copy/Paste)

Here’s a conceptual walkthrough of a backtest run flow you can map directly to your code.

Initialization

Load market data for the time range
Create portfolio with initial cash
Initialize indicator state and strategy instance
Load execution model parameters (fees, slippage, fill timing)

Main Simulation Loop

For each bar i:

Update indicator values based on the new bar
Call the strategy with current bar index and updated indicator outputs
Strategy may submit one or more orders (or choose none)
Execution simulator produces fills at the defined next execution time
Portfolio updates cash and positions using fill results
Equity is marked to a chosen price (close or mid) and stored

If your backtester supports multiple instruments, the “bar” may mean a time step where you process each symbol’s bar available in that timestamp.

Finalize

Compute metrics from equity curve
Compute trade stats
Write results to disk (summary + logs + equity history)

Real-World Use Cases for Building Your Own Backtester

Most people start this project for one of these reasons.

Custom Execution Assumptions

Maybe you trade using a broker API that fills differently than “fill at next open.” Or you want to simulate a particular venue fee schedule. If your strategy needs a specific execution model, a custom backtester beats adapting a generic one.

Integrating Multiple Data Sources

You might want to trade based on a composite signal: price plus a fundamental update schedule, or a volatility forecast from another pipeline. A custom engine lets you integrate that cleanly.

Research Discipline

Sometimes you build your own backtester because you don’t trust the existing tools. That’s valid—especially if you’ve seen backtests that “work” only when someone forgets about execution timing.

A custom build lets you enforce discipline across the whole pipeline.

FAQ: Questions People Ask After Their First Backtest

Why does my strategy look profitable but I can’t replicate it live?

Usually it’s timing and costs. Common culprits:

signal uses close, fill also uses close
slippage and fees were too optimistic
limit order assumptions were too generous
live data differs (missing bars, different currency conversions, different sessions)

What’s the fastest way to find bugs in a backtester?

Use a tiny dataset (like 30-200 bars), set a strategy that makes a known sequence of trades, and compare:

order timestamps
fill prices
cash/position math after every trade

If you can’t reconcile it step-by-step, your backtest engine will never be trustworthy.

Should I start with a library or write from scratch?

It depends. If you only need basic bar backtesting, a library can accelerate research. If you know your execution assumptions are unusual, or you want absolute control over internals, writing from scratch pays off.

Either way, you still need to validate timing, data alignment, and costs.

Wrapping It Up

Making your own backtesting software is less about building a fancy charting tool and more about building a simulation engine you can trust. With the right architecture—separating data, strategy, execution, portfolio accounting, and reporting—you can iterate quickly without turning the project into a fragile mess.

Start small: bar data, market orders, deterministic execution. Then add realism carefully: fees, slippage, multiple symbols, and optionally more order types. Along the way, keep your assumptions explicit and your validation strict. When your backtester produces believable results under multiple cost and timing settings, you’re not just getting a good equity curve—you’re building a tool that can survive contact with reality.