Your brokerage can do a lot, but it still won’t match your habits, your data preferences, or your tolerance for tinkering. If you’ve ever stared at a stock screener and thought, “This is close… but not quite,” building your own stock screening software starts to look less like a hobby and more like a practical upgrade.
This article walks through how to design and build a screener that actually fits your process: from choosing data and defining filters, to handling corporate actions, backtesting with sane assumptions, and packaging the result so it stays usable when life happens (it will).
What a stock screener really is (and what it isn’t)
A stock screener is not magic, and it doesn’t “find winners” the way people like to brag on the internet. At its core, a screener is a pipeline:
- Collect a universe of securities
- Gather the fields you plan to filter on (price, fundamentals, ratios, technical indicators, events)
- Apply boolean rules and scoring logic
- Sort and present results
A good screener is mostly about data hygiene and query logic. A great screener is mostly about workflow: how quickly you can iterate, how repeatable your experiments are, and how clearly you understand what each screen implies.
Define your screening goal before writing code
If you start coding first, you’ll end up with a fast machine that can’t answer your actual question. Start with a plain-English goal.
Trading-style screens: the “right now” question
If your screening targets short-term trading, your filters often emphasize:
– Liquidity (spreads, volume, average daily range)
– Price behavior (gaps, relative strength, trend)
– Risk control (volatility caps, range limits)
The gotcha: you’re filtering on data that changes fast. Your software needs reliable timestamps and consistent lookback windows.
Investing-style screens: the “next few quarters” question
If you’re screening for longer-horizon investing, you’ll care about:
– Fundamentals (valuation metrics, earnings stability, leverage)
– Growth and quality measures (revenue trends, margins, cash flow behavior)
– Event-aware constraints (dilution, debt maturities, earnings calendar)
The gotcha: accounting data comes with restatements, delayed reporting, and assumptions that you must encode consistently.
A practical way to write a screen
Write it like a recipe. For example:
– Universe: US stocks with $X+ average volume
– Filter: positive operating cash flow, debt-to-equity under Y, trailing PE between A and B
– Add a scoring step: prefer stable margins and improving free cash flow
– Output: top 50 by score, with a “why it passed” explanation
This becomes your spec. Your code can follow it without guesswork.
Choose your data sources (and accept tradeoffs)
Building a screener is largely selecting data that you can use reliably. Most people underestimate this part, then pay for it later in debugging hell.
Market price and volume data
You’ll usually need:
– Daily OHLCV candles (open/high/low/close/volume)
– Corporate action adjustments (splits, dividends if you care about total return)
– Accurate trading calendars (holidays, early closes)
If you don’t handle splits correctly, your “historical” technical indicators will occasionally look like they’re drunk. Not subtle.
Fundamental and financial statement data
Fundamentals can come from:
– Financial statement line items and derived ratios
– Earnings and guidance fields (if available)
– Shares outstanding and diluted shares for per-share calculations
Expect that statements may be updated. Your screener should store raw source data and the computed metrics version so you can reproduce results.
Corporate actions and dividends
Even if your screen is mostly fundamentals and ratios, corporate actions affect:
– Price history
– Per-share computations
– Total return calculations (if you use them)
At minimum, store the adjustment factor used for price series.
Data latency and survivorship bias
This is where many screeners get quietly dishonest. If your data only includes currently-listed companies, you risk survivorship bias. If your fundamentals “arrive” later than they did historically, you risk look-ahead bias.
You can’t always fully eliminate bias, but you need to understand what you’re doing:
– Are your data snapshots point-in-time or update-in-place?
– Can you query “as of” dates for fundamentals?
– Are delisted names included historically, or not?
A sane approach: if you can’t get point-in-time, don’t pretend you’re doing strict backtesting. You can still screen, but label it accordingly.
Pick the architecture: simple first, scalable later
A screener can be a weekend project or a small software product. Either way, aim for separation of concerns:
– Data layer: fetch, clean, store
– Computation layer: compute indicators and metrics
– Screening layer: apply rules and scoring
– Presentation layer: show results (CLI, web app, dashboard)
– Reproducibility layer: log config, versions, and “as-of” dates
Local script vs service
If you’re validating ideas, a local script works:
– Cron it daily
– Output CSV and charts
– Iterate on code
Once you’re running screens regularly, a service starts to help:
– Caches computed metrics
– Runs on schedule
– Provides consistent outputs via a UI or API
You don’t need enterprise architecture. But you do need a structure that doesn’t collapse at month two.
Language choices
Common stacks:
– Python: fast iteration, many finance libraries, good for ETL and ML-adjacent work
– JavaScript/TypeScript: good for web dashboards; pair with Python for data
– C#/Java: mature ecosystems, less common in retail finance projects
Whatever you choose, keep the data and computation logic deterministic. That’s the part that stops bugs from multiplying.
Design your data model for sanity
Stock screening breaks when your data model doesn’t reflect time.
Store raw time series and computed features separately
A practical pattern:
– Raw prices: unchanged base dataset (or versioned datasets)
– Raw fundamentals: stored with report dates and filing dates if possible
– Features: computed from raw data, stored with the calculation version and as-of date
This makes debugging possible. When your PE filter behaves strangely, you can inspect the feature inputs and recompute with a new version.
Use stable identifiers for securities
Symbols change. Tickers change. Corporate actions create new listings. Use a stable identifier if you can (exchange instrument ID, security ID, or an internal mapping that you keep updated).
At minimum, keep a historical mapping table that tells you:
– symbol at time t
– security ID at time t
– reason for change, if known
Calendar handling: trading days are not calendar days
If your indicator uses a “20-day moving average,” you need trading-day windows. Don’t treat it as “20 calendar days” unless you intentionally do so.
Store or compute a trading calendar and align your series.
Implement indicator and metric pipelines
A screener usually needs both technical indicators and fundamental metrics.
Technical indicators: consistent definitions matter
Technical indicators depend on definitions:
– RSI: which smoothing method?
– Moving averages: simple vs exponential
– ATR: how you compute true range
– MACD: standard settings or your own
If you don’t standardize definitions, you’ll end up changing the meaning of your screen halfway through your project. That creates the kind of confusion that feels personal.
Fundamental ratios: prefer base statements then derive
Instead of pulling “free cash flow” from one provider and “operating cash flow” from another, consider deriving metrics from consistent inputs:
– Free cash flow from operating cash flow and capex
– Gross margin from gross profit and revenue
– Debt ratios from consistent balance sheet measures
If you must use multiple sources, version your derived metrics and document assumptions.
Feature computation should be repeatable
Add the following metadata to each computed feature batch:
– calculation version (a hash of your feature code or config)
– as-of date
– universe definition used
– data snapshot identifiers (if your source supports it)
You’ll thank yourself the first time you re-run and wonder, “Why did my results shift?”
Write the screening logic: rules, thresholds, and scoring
Your screening core is either:
– boolean filtering (keep or drop)
– scoring and ranking (rank by some function)
– both
Boolean screens are easy to test
Example logic:
– Pass liquidity if avg_volume_20d > threshold
– Pass valuation if pe_ratio between low and high
– Pass quality if gross_margin_avg_8q > threshold
Keep each rule separate and label it. When a ticker fails, you want to know which rule killed it.
Scoring adds nuance, but you must normalize
A scoring function might combine:
– value score
– growth score
– momentum score
– risk penalty
But those components have different scales. If you just sum raw metrics, one component dominates. Standard practice:
– transform each component into comparable ranges (z-scores, percentiles, or capped scoring)
– set weights explicitly
– document why those weights exist
This isn’t about “optimization theater.” It’s about interpretability.
Beware “ratio traps”
Ratios can explode when denominators get small:
– PE when earnings are near zero
– debt-to-equity when equity is tiny
– margin ratios when revenue dips during downturns
You should guardrails:
– minimum denominator thresholds
– winsorization (cap extreme values) if you use statistical scaling
– fallback logic (e.g., if denominator < X, treat ratio as missing)
Output design: make results usable, not just correct
A screener that outputs only a ticker list is okay for early experiments. A useful screener explains itself.
Explain “why it passed”
For each company, show:
– the values of key metrics used in filters
– which boolean rules passed
– the score breakdown if you use weights
This helps you debug screens and prevents the classic mistake: thinking a stock passed because of one factor when it actually scraped by on another.
Include data freshness and as-of timestamps
If you run screens daily, the output should contain:
– as-of date for market data
– reporting window for fundamentals
– calculation version
Without this, “today’s output” becomes a vague rumor instead of a reproducible artifact.
Export formats
Common options:
– CSV for quick analysis
– JSON for API/dashboard integration
– Parquet for internal data science storage
If you plan to backtest, store full feature snapshots so you can reconstruct what your model “saw.”
Backtesting and screening: don’t mix them like they’re the same job
Screening finds candidates. Backtesting evaluates whether the candidates produce returns under assumptions. They’re related but not identical.
Basic backtesting loop
A simplified approach:
1. At each rebalancing date, run the screen using data available at that date.
2. Create a portfolio from the selected set (equal weight is easy to start).
3. Simulate forward returns over the holding period.
4. Track performance metrics and turnover.
The two common failures:
– using future data in fundamentals
– applying rebalancing without transaction costs
Transaction costs and slippage
If you simulate only raw returns, you’ll overestimate. Even simple assumptions help:
– commission or fee model
– estimated slippage based on liquidity
– turnover-based costs
You don’t need a perfect market model; you do need a costs model so results don’t look like a free lunch.
Turnover and re-screen frequency
If your screen changes often, your turnover increases. High turnover can drag returns more than you’d expect.
A practical compromise:
– run screens daily but rebalance weekly or monthly
– or run incremental updates while keeping a stable portfolio
Program structure that won’t annoy you later
You’re building software, not just a script. A bit of structure now prevents “rewrite it all” later.
Configuration files over hardcoded values
Keep thresholds, weights, and universe definitions in a config:
– YAML/JSON/TOML works
– include validation defaults
– version your screen config
That way, you can compare two screens without changing code for every experiment.
Testing: build small tests for each step
At minimum, test:
– indicator calculations on known series
– ratio computations with edge cases
– filter logic for missing values
– data alignment (no off-by-one window bugs)
A screener is basically a data pipeline with math inside. Tests are your seatbelt.
Logging and audit trails
Log:
– start/end time
– universe size
– missing data rates per metric
– how many tickers passed each rule
– output file and feature batch IDs
If something breaks, logs are how you avoid spending an evening guessing.
Handling missing data without fooling yourself
Most screening mistakes come from missing values treated as zero or treated inconsistently.
Choose a policy per metric
For each metric, decide:
– strict missing: fail the rule if metric is missing
– permissive missing: skip the rule or use fallback logic
– impute: fill with a statistical estimate (risky unless you’re careful)
For fundamentals, “missing” is often meaningful. Illiquid or newly listed companies won’t have the same coverage.
A screen should reflect your intent. If you meant to screen only mature issuers, failing on missing fundamentals is fine.
Prevent accidental implicit casts
In Python and many languages, a missing value might behave oddly in comparisons. Treat missing values explicitly. A rule like “pe_ratio < 15” should not silently treat missing PE as pass.
That’s how incorrect results become confident results.
Performance: caching and incremental updates
The difference between “runs in 30 seconds” and “runs overnight” comes down to how you compute. Some indicators are expensive if you recompute everything each run.
Cache computed features
Computed features should be stored:
– by security and date range
– with version and calculation parameters
– so you can update only the newest portion
Example: if you append one trading day, you shouldn’t recompute a 200-day RSI for every day from scratch.
Use vectorized computations where possible
If you use Python, vectorized operations (pandas/numpy) often help. In other languages, array operations and careful batching matter.
The goal: keep the computation stage predictable, not fragile.
Security, privacy, and operational hygiene
If your screener runs on a machine with credentials (API keys, brokerage keys), treat it like production, because at some point you’ll forget it’s running.
Store secrets properly
Use environment variables or a secrets manager. Don’t commit keys into source control. This is not “paranoia,” it’s basic adulthood.
Validate outputs
Sanity checks:
– number of tickers in universe
– distributions of key computed metrics
– top results are not obviously broken (negative volume, absurd ratios)
Even a few sanity checks can save a week when the data provider changes a column name.
User interfaces: CLI, web dashboard, or spreadsheet output
You don’t need to build a fancy app unless you want to. Most people end up with a hybrid.
CLI output is good for fast iteration
Example workflow:
– run `screen –config value_qof_score.yaml –asof 2026-04-10`
– export CSV
– analyze in a notebook or spreadsheet
This keeps iteration tight.
A small dashboard helps when you’re reviewing repeatedly
If you review results daily:
– show filter metrics
– show score breakdown
– show charts for a handful of top names
A dashboard reduces the pain of re-downloading data and re-formatting it every day.
Spreadsheet exports are “good enough” until they aren’t
Spreadsheet exports are fine for early research. They become a bottleneck if you want repeatability and history tracking.
If you go spreadsheet-style, store your inputs and computed metrics alongside outputs.
Common pitfalls that wreck screeners
These problems aren’t rare; they’re practically a subscription service.
Look-ahead bias from fundamentals
If you use “latest quarterly numbers” without ensuring they were available at the screening date, you might be using information that the market couldn’t have known yet.
If your fundamentals dataset isn’t point-in-time, label your backtesting as “indicative,” not strict.
Survivorship bias in the universe
If you screen only currently-listed companies, your backtest ignores delisted losers. That inflates results.
If you can’t include delisted names, treat performance estimates as optimistic.
Corporate action mismatches
If your price series is adjusted and your volume isn’t handled consistently, indicators can shift. Make sure adjustments apply where needed.
Inconsistent units and scaling
Watch out for:
– revenue reported in thousands vs millions
– market cap in dollars vs millions
– debt as short-term vs total depending on provider
If a ratio suddenly becomes 0.0003, it’s usually not “the market.” It’s math using the wrong units.
Example project plan: from zero to usable screener
Here’s a realistic progression that won’t turn into an endless rewrite.
Phase 1: Minimal screener (1–2 weeks)
Goal: produce a repeatable CSV of tickers that pass a couple of rules.
– Choose a small universe and define 3–6 metrics
– Implement price feature(s) (e.g., moving average, average volume)
– Implement 3 fundamental fields (e.g., PE, ROE, operating cash flow)
– Build filtering and sort output
– Ensure reproducible outputs with as-of dates and versioned config
Keep it simple. If the output works, you’re already ahead of most people.
Phase 2: Add “why it passed” and scoring (1–2 weeks)
– Add per-rule pass/fail explanations
– Add percentile-based scoring
– Add missing-data policy per metric
– Add export of feature snapshots
Now you can diagnose behavior quickly.
Phase 3: Backtesting loop (2–4 weeks)
– Define rebalance frequency and holding period
– Simulate returns with transaction costs
– Track turnover and basic performance metrics
– Build a small report summarizing results
You’ll learn more here than you expected, mostly about how forgiving the market is and how unforgiving your assumptions are.
Phase 4: Make it run reliably (ongoing)
– incremental updates and caching
– scheduled runs
– alerting on data gaps or provider outages
– regression tests for indicator changes
This is where it stops being a “project” and becomes a tool.
Quality control: verify each part like you mean it
Even simple screens need verification.
Cross-check computed indicators
Compare your indicator outputs to a trusted reference for a handful of symbols. Use the same settings and confirm that:
– moving averages line up
– RSI and volatility are consistent
– the indicator values at the boundaries (first window) behave as expected
Then repeat after you update code.
Cross-check derived ratios
For fundamentals-derived ratios:
– confirm that you use the right time window (TTM vs quarterly)
– confirm whether you use diluted shares
– confirm whether you scale values
Ratios are sensitive. One unit mismatch and everything is toast.
Validate universe logic
Check that:
– your universe includes the expected number of securities
– filters don’t accidentally drop everything due to missing coverage
– delisted names aren’t excluded if you plan to backtest
Extending the screener: events, alerts, and portfolio workflows
Once the screener works, you’ll want it to do more than a daily CSV.
Event-aware screens
If you screen around earnings or guidance:
– require “next earnings date” fields
– decide how to treat missing earnings dates
– consider that pre/post earnings price action can distort indicators
Event fields also help you avoid screens that accidentally hold stocks during binary events you didn’t want.
Alerting and watchlists
Some practical patterns:
– alert when a stock crosses a threshold after your screen runs
– alert when a stock enters the top N by score
– alert when a stock fails a previously satisfied condition
Alerting can turn your screener into a proactive tool, not just a report generator.
Integrate with paper trading or brokerage
If you paper trade:
– keep order sizing separate from screening logic
– log decisions and rationale
– compare screened vs executed universe (slippage and fills matter)
You’ll quickly learn which parts of the system were “theory” and which parts survive reality.
Tradeoffs you can’t avoid (so choose them deliberately)
Every screener involves compromises.
Speed vs accuracy
Recomputing everything each time is consistent but slow. Incrementally updating is faster but demands careful correctness.
A balanced approach:
– cache features
– recompute only what changed
– run periodic full recalculations to detect drift
Coverage vs strictness
Strict screens that fail on missing data can leave you with a narrow set of large, stable companies. Permissive screens can include more candidates but might include messy data.
Pick the strictness that matches your strategy’s requirements.
Simple rules vs complex models
It’s tempting to add machine learning. Sometimes that’s useful, but your base system should be strong first:
– stable data pipelines
– stable features
– stable screening logic
A model layered on top of shaky data is like building a skyscraper on sand, except the sand is your “just trust me” dataset.
What to measure so you know it’s working
Even without full backtesting, measure operational metrics:
– how many tickers pass each rule
– missing data rates
– runtime
– output stability (are results wildly different day to day?)
If your outputs change too much without a reasonable reason, it might be your data pipeline, not “market regime change.”
Common “first screen” ideas people actually use
You don’t need a hundred indicators. A few well-chosen filters often beat a complicated mess.
Liquidity and tradability filters
A starting point:
– minimum average volume
– minimum price
– exclude extreme microcaps
This protects your future self from trying to trade something that can’t be traded.
Valuation plus quality pairing
Example conceptual pairing:
– valuation range filter
– plus profitability or cash flow stability filter
The aim is not to catch magic. It’s to avoid obvious low-quality traps.
Trend support for timing
A simple trend filter can improve signal-to-noise:
– price above a moving average
– momentum is positive over a chosen lookback
This is often less “forecasting” and more “don’t fight the current.”
Building it as a product: versioning and reproducibility
Once your screener becomes part of your workflow, reproducibility matters.
Version your screen configuration
Every time you change thresholds or logic:
– increment a version
– save config and computed metrics version
– store outputs with timestamps
This allows you to compare results across experiments without “vibes.”
Document assumptions and definitions
In a README or internal docs, capture:
– definitions for each metric (TTM vs quarterly)
– indicator parameter settings
– missing-data policy
– as-of behavior for fundamentals
This is boring work, but boring is good. Boring doesn’t surprise you.
Putting it all together: a sensible end-state
A homemade stock screening software that you’ll actually keep using usually ends up with these characteristics:
– You can run it repeatedly with the same config and get consistent outputs.
– You can explain why a stock passed.
– You understand the as-of dates and data availability assumptions.
– You can update indicators and thresholds without breaking everything.
– You can export results for analysis without copy-paste chaos.
If you get those things right, you aren’t just building a screener. You’re building a controlled process for generating candidates.
Next steps depending on your current skill level
If you’re an experienced developer, you’ll probably jump straight into the architecture and data pipelines first. If you’re newer, you should focus on a smaller scope:
– start with only daily price features
– add a few fundamentals
– write filters and export results
– then add backtesting once the screen is stable
The most important thing is to avoid “big bang” development. Build the pipeline, prove it works, then extend.
One last thing: keep the screener honest
Your own screener is still a tool. It can’t correct for bad assumptions, survivorship bias, or missing point-in-time data. But it can be honest in the way it’s designed: clear definitions, explicit as-of behavior, and reproducible computations.
That honesty is what lets you iterate in real time and makes your results worth discussing with your own brain, instead of arguing with your own spreadsheet.