Data Quality Layer

Clean forex data is
the product

HistoricalFX turns raw historical price archives into validation-ready datasets for backtesting, Python research, CSV workflows, and MetaTrader imports. The commercial value is not just access to prices; it is reducing the cleaning, conversion, and data-quality work traders usually have to do themselves.

74
Verified symbols in current R2 release
7
Timeframes generated consistently
300.4M
Rows audited in current R2 release

Manifest-backed proof layer

The business is being built around auditable data releases. The current manifest ties the paid R2 bundle to the local release-quality report, so validation claims can be proven release by release instead of hand-waved.

74
Delivered symbols
518
Delivered Parquet files
300.4M
Delivered rows
499
Files flagged for review

This manifest describes the current uploaded paid R2 bundle and the local release-quality audit. Coverage varies by symbol; use pair-level reports before making strategy assumptions.

Last manifest generated: 2026-06-06T00:57:17.159Z

Validation checks

The goal is to make historical FX data boring: predictable columns, consistent timestamps, reproducible exports, and fewer hidden assumptions in every backtest.

Timestamp continuity

Every symbol and timeframe is checked for missing bars, duplicate timestamps, non-monotonic order, and weekend/session artifacts.

OHLC integrity

Bars are rejected or flagged when high/low/open/close relationships are impossible, malformed, or outside expected market structure.

Bad tick screening

Outlier moves are detected against neighboring bars and higher-timeframe ranges so obvious feed spikes do not poison a backtest.

Timeframe reconciliation

M5, M15, H1, H4, daily, and weekly files are generated from the same normalized minute base to keep timeframes internally consistent.

Format validation

Parquet exports are loaded after generation to verify schema, date parsing, numeric columns, and row counts. CSV and MetaTrader conversion files are verified separately when rebuilt.

Reproducible packaging

Dataset builds are scripted so future refreshes can be regenerated, compared, and audited instead of hand-edited file by file.

Cleaning pipeline

This is the core product engine. As the business matures, the same pipeline can power subscriptions, commercial licenses, API access, and custom validation reports.

01

Ingest raw minute archives by symbol and month.

02

Normalize timestamps and column names into one canonical OHLCV schema.

03

Remove duplicates and impossible bars.

04

Detect gaps, quiet-session edge cases, weekend bars, and suspicious spikes.

05

Generate derived timeframes from the normalized minute base.

06

Export buyer-ready Parquet packages; rebuild CSV and MetaTrader conversion files only when matching delivery artifacts are ready.

07

Run post-export validation and sample-load checks before release.

Why this can become more than a file store

Raw data is easy to compare on price. Cleaned, documented, reproducible data can become infrastructure. The same validation layer can support one-time downloads, commercial licenses, API subscriptions, and custom data-quality work for teams.

Download sample →

Limits and roadmap

The current product focuses on audited OHLCV bar releases. The next defensible layers are expanded historical rebuilds, commercial licensing, refreshed data builds, and API access for repeat users.

support@...

Data quality FAQ

Is this just raw free forex data repackaged?
No. The value is the normalization, validation, timeframe generation, packaging, and developer-ready delivery layer. Free sources can be useful, but they usually push the cleaning burden onto the trader or developer.
Why does data cleaning matter for forex backtesting?
Bad timestamps, duplicate bars, missing minutes, feed spikes, and inconsistent timeframes can turn a strategy from profitable to useless or make a broken backtest look attractive.
Do you provide tick data?
The current product is OHLCV bar data from M1 through weekly. Tick-level products and data-quality reports are on the roadmap.
Can teams request custom cleaning or validation?
Yes. The next product layer is commercial licensing and custom data validation for teams that need documented, repeatable datasets.