Forex Data Cleaning: Handling Missing Bars and Bad Ticks
The phrase "garbage in, garbage out" was practically invented for algorithmic trading. You can have the most sophisticated neural network in the world, but if your forex data cleaning process is non-existent, your backtests will be a lie. Historical data is rarely perfect. Even the best feeds can have occasional gaps, bad ticks, or "fat-finger" errors that create artificial price spikes.
Identifying Common Data Issues
Before you run a single strategy, you need to audit your dataset. The three main enemies are missing bars, duplicate timestamps, and outliers. Missing bars are common during low-liquidity periods (like the Sunday open) or during server outages. Outliers, or "bad ticks," are often caused by feed errors where the price jumps to an impossible level for a single millisecond before returning to normal.
When you use the 25 years of data from historicalforexprices.com, you are starting with a high-quality foundation across 66 currency pairs. However, it is still a best practice to run a forex data cleaning script to ensure your specific environment handles the data correctly. For instance, some platforms don't like it if the "high" of a candle is lower than the "open."
Techniques for Cleaning Data
How you handle a gap depends on your strategy. If you are missing one minute of data, interpolation (filling the gap with the average of the surrounding bars) is usually fine. If you are missing three hours, it is better to leave the gap or mark it as "invalid" rather than making up price action that never happened. For bad ticks, a simple "z-score" or standard deviation filter can identify prices that are statistically impossible given the recent volatility.
Here is a simple Python example using Pandas for forex data cleaning:
import pandas as pd
import numpy as np
def clean_forex_data(df):
# Remove duplicates
df = df.drop_duplicates(subset=['timestamp'])
# Fill missing minutes (reindex)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp')
df = df.asfreq('1Min')
# Interpolate missing values
df['Close'] = df['Close'].interpolate(method='linear')
df['Open'] = df['Open'].fillna(df['Close'])
df['High'] = df['High'].fillna(df['Close'])
df['Low'] = df['Low'].fillna(df['Close'])
# Simple outlier detection (if price moves > 2% in 1 min, it's likely a bad tick)
df = df[df['Close'].pct_change().abs() < 0.02]
return df
Why Clean Data Matters for Backtesting
A single bad tick can trigger a fake stop-loss or a fake take-profit in your backtester. If your system sees a 500-pip spike that didn't happen in reality, your "winning" strategy might actually be a loser. This is why forex data cleaning is the most important step in the development pipeline. Professional traders spend 80% of their time preparing data and only 20% building models.
By sourcing your data from historicalforexprices.com, you get 25 years of data for 66 currency pairs that have already been through a rigorous quality control process. This saves you hours of manual labor. However, always remember to verify. A clean, reliable dataset is the only way to build the confidence required to trade large sizes in the live market. Don't let a bad tick ruin your career.
Related Articles
Need Historical Forex Data?
25 years of clean, backtesting-ready data for 66 currency pairs. Parquet format optimized for Python and pandas.
View Data Packages