Article-22063-Alternative-B.../README.md

138 lines
6.3 KiB
Markdown
Raw Permalink Normal View History

2026-06-09 22:26:00 +03:00
# Article-22063-Alternative-Bars-For-Market-Intent
This repository is an article-derived reference project based on the original MQL5 article. It does not claim to reproduce the full original source code unless files are explicitly attached.
## Overview
Reference repository for the MQL5 article on alternative market data sampling methods inspired by Chapter 1 of *Advances in Financial Machine Learning*. The article presents a dual implementation:
- a Python batch-processing pipeline for multi-year tick histories
- an MQL5 object-oriented library for live tick-by-tick bar construction inside an Expert Advisor
The covered bar families include standard bars and imbalance-based information bars, with emphasis on data cleaning, scalable storage/loading, adaptive threshold calibration, and parity verification between Python and MQL5 outputs.
## Original Article
- **Article ID:** 22063
- **Author:** Patrick Murimi Njoroge
- **Publication date:** 2026.05.08
- **Category:** Machine learning
- **URL:** https://www.mql5.com/en/articles/22063
## Repository Purpose
This repository should be treated as a technical reference/reconstruction of the article’s described architecture for alternative bar construction.
Its purpose is to document and organize:
- standard bar construction: time, tick, volume, dollar
- information bar construction: tick imbalance, volume imbalance, dollar imbalance
- preprocessing of tick streams before bar generation
- scalable Python-side storage/loading using Parquet and Dask
- MQL5 live bar construction with restart-safe state persistence
- Python/MQL5 parity validation concepts
## Key Concepts
- Clock-based sampling can inject heteroscedasticity by forcing equal time windows with unequal information content.
- Activity-based bars close on fixed market activity instead of elapsed time.
- Information bars close on directional imbalance, not only raw activity.
- Tick-rule classification assigns direction using price changes and carry-forward logic on unchanged prices.
- Imbalance thresholds are updated using incremental exponentially weighted means.
- Time bars may generate phantom zero-tick bars across inactive periods and require explicit filtering.
- Live MQL5 implementations must preserve state across terminal restarts to avoid threshold resets.
- Parquet partitioning and Dask loading are used to avoid loading full multi-year tick datasets into memory.
## Algorithm / Architecture Summary
The article describes the following processing flow:
1. **Tick storage**
- Raw data is stored in partitioned Parquet files by symbol/year/month.
- Compression is performed with PyArrow using `zstd`.
2. **Data loading**
- Python uses Dask `read_parquet()` with date filters.
- Only relevant partitions are materialized into pandas.
3. **Tick cleaning**
- Ensure `DatetimeIndex`
- Normalize timezone
- Remove invalid prices and non-positive spreads
- Drop NaN/NaT-related issues
- Remove duplicate timestamps with `keep="last"`
- Sort chronologically
4. **Standard bars**
- Time bars via resampling
- Tick bars via fixed tick counts
- Volume bars via cumulative volume thresholds
- Dollar bars via cumulative price×volume thresholds
- Time bars additionally filter zero `tick_volume` rows
5. **Information bars**
- Tick rule computes directional sign
- Signed metric is accumulated as imbalance
- A bar closes when absolute imbalance exceeds an adaptive threshold
- The threshold depends on EWM estimates of:
- expected bar length
- expected absolute imbalance per bar
6. **Unified Python API**
- `make_bars()` dispatches to the requested bar type
- Information bars can be auto-calibrated using `target_timeframe`
- Explicit seeds (`exp_ticks_init`, `exp_imbalance_init`) remain available
7. **MQL5 runtime design**
- Abstract base class for common OHLC/volume/spread accumulation
- Derived classes implement close semantics for each bar family
- Example EA processes ticks in `OnTick()`
- CSV append output is used for live logging
- State persistence is used for restart recovery, especially for imbalance bars
8. **Parity testing**
- Python and MQL5 bars are aligned by `tick_num`
- Comparison checks OHLC and volume fields for exact or near-exact equality
## Mentioned or Attached Files
### Explicitly attached files
- `AlternativeBars\CBarConstructor.mqh` — abstract base class, `SBar` struct, persistence interface
- `AlternativeBars\CStandardBars.mqh` — standard bar classes: time, tick, volume, dollar
- `AlternativeBars\CImbalanceBars.mqh` — tick rule and imbalance bar implementation with EWM threshold state
- `BarBuilderEA.mq5` — example Expert Advisor for live bar construction and CSV/state handling
### Files mentioned in the article text
- `afml/data_structures/bars.py` — unified Python `make_bars()` entry point
- `afml/data_structures/information_bars.py` — JIT-compiled information-bar boundary detection
- `afml/data_structures/calibration.py` — automatic calibration for imbalance-bar initialization
## Statistics
- **Bar types discussed:** 10 total in Python
- **Bar types mirrored in MQL5:** 7
- **MQL5 header files described:** 3
- **Example Expert Advisor described:** 1
- **Main pipeline stages highlighted:** storage, loading, cleaning, bar construction, calibration, persistence, parity verification
## Tags
`mql5` `python` `machine-learning` `tick-data` `alternative-bars` `time-bars` `tick-bars` `volume-bars` `dollar-bars` `imbalance-bars` `parquet` `dask` `numba`
## Difficulty
Advanced
## Limitations
- This repository is based on article analysis and attached-file descriptions; the full original source tree is not fully reproduced here unless the listed files are actually present in the repository.
- The article describes both Python and MQL5 implementations, but only some file paths are explicitly listed as attached.
- Python package structure, auxiliary utilities, and configuration files may be incomplete or absent.
- Installation, build, and execution steps should not be assumed beyond what the article explicitly states.
- If the repository does not contain the attached files listed above, then the processed input should be treated as documentation-only reconstruction.
## Reference
- Patrick Murimi Njoroge, [“Beyond the Clock (Part 1): Building Activity and Imbalance Bars in Python and MQL5”](https://www.mql5.com/en/articles/22063) MQL5 article 22063, 2026.05.08