mql5/Experts/Advisors/DualEA/docs/Policy-Exploration-Guide.md

582 lines
16 KiB
Markdown
Raw Permalink Normal View History

2025-10-03 01:38:19 -04:00
# Policy & Exploration Guide — DualEA System
**ML policy gating, fallback modes, and exploration system**
---
## Table of Contents
1. [Policy System Overview](#policy-system-overview)
2. [Policy Structure](#policy-structure)
3. [Policy Gating Logic](#policy-gating-logic)
4. [Policy Fallback Modes](#policy-fallback-modes)
5. [Policy Scaling](#policy-scaling)
6. [Exploration Mode](#exploration-mode)
7. [Exploration Caps](#exploration-caps)
8. [Troubleshooting](#troubleshooting)
---
## Policy System Overview
### Purpose
The **policy system** allows ML models to influence trading decisions through:
- **Confidence gating**: Block low-confidence predictions
- **Parameter scaling**: Adjust SL/TP/lots based on model output
- **Risk management**: Dynamic position sizing based on predicted outcomes
### Files
**policy.json**: ML-generated trading policy
- Path: `Common/Files/DualEA/policy.json`
- Generated by: `ML/policy_export.py`
- Structure: Per-slice (strategy|symbol|timeframe) probabilities and scaling multipliers
**policy.backup.json**: Last-known-good policy
- Used for rollback on parse/plausibility failures
- Updated on successful policy load
2026-02-05 01:22:42 -05:00
**policy.reload**: Trigger file for hot-reload
- PaperEA_v2: Hot-reload supported via `policy.reload` + HTTP polling
- LiveEA: Load on init only (restart required for updates)
2025-10-03 01:38:19 -04:00
---
## Policy Structure
### JSON Schema
```json
{
"version": "1.0",
"generated_at": "2025-01-15T10:00:00Z",
"model_hash": "abc123...",
"train_window": {"start": "2024-01-01", "end": "2025-01-15"},
"metrics": {
"roc_auc": 0.72,
"brier_score": 0.18,
"expected_r": 0.65
},
"min_confidence": 0.55,
"slices": [
{
"strategy": "ADXStrategy",
"symbol": "EURUSD",
"timeframe": 60,
"probability": 0.75,
"sl_mult": 1.0,
"tp_mult": 1.2,
"lot_mult": 1.0,
"trail_mult": 1.0
},
...
]
}
```
### Fields
**Global**:
- `version`: Schema version
- `generated_at`: Timestamp of policy generation
- `model_hash`: Trained model identifier for provenance
- `train_window`: Data range used for training
- `metrics`: Model performance metrics
- `min_confidence`: Global minimum confidence threshold
2026-02-05 01:22:42 -05:00
**Per-Slice** (as implemented in LiveEA; PaperEA_v2 uses minimal parsing):
2025-10-03 01:38:19 -04:00
- `strategy`: Strategy name (e.g., "ADXStrategy")
- `symbol`: Trading symbol (e.g., "EURUSD")
- `timeframe`: Timeframe in minutes (e.g., 60 for H1)
2026-02-05 01:22:42 -05:00
- `p_win`: ML model confidence (0.0-1.0)
- `sl_scale`: Stop loss scaling multiplier (optional)
- `tp_scale`: Take profit scaling multiplier (optional)
- `trail_atr_mult`: Trailing stop ATR multiplier (optional)
- `confidence`: Alternative confidence field (optional)
2025-10-03 01:38:19 -04:00
---
## Policy Gating Logic
### Configuration
```cpp
input bool UsePolicyGating = true; // Enable ML policy gating
input bool DefaultPolicyFallback = true; // Enable fallback modes
input bool FallbackDemoOnly = true; // Restrict fallback to demo
input bool FallbackWhenNoPolicy = true; // Fallback if policy not loaded
input bool FallbackWhenSliceMissing = true; // Fallback if slice missing
```
### Decision Flow
```
Is UsePolicyGating=true?
No → PASS (no policy gating)
Yes → Continue
Is policy loaded (slices > 0)?
No → Check FallbackWhenNoPolicy
Yes → Check FallbackDemoOnly
Demo account → FALLBACK (neutral scaling)
Live account → BLOCK
No → BLOCK (reason: "no policy loaded")
Yes → Continue
Does exact slice exist (strategy|symbol|TF)?
No → Check FallbackWhenSliceMissing
Yes → Check FallbackDemoOnly
Demo account → FALLBACK (neutral scaling)
Live account → BLOCK
No → BLOCK (reason: "policy slice missing")
Yes → Continue
Is slice.probability >= policy.min_confidence?
No → BLOCK (reason: "confidence too low")
Yes → PASS + Apply scaling
```
### Code Example
```cpp
bool ApplyPolicyGating(SignalData &signal, string &reason) {
if(!UsePolicyGating) return true;
// Check if policy loaded
if(policyData.slices == 0) {
return HandleFallback("no_policy", signal, reason);
}
// Find policy slice
PolicySlice slice = policyData.Find(
signal.strategy_name,
symbol,
timeframe
);
if(slice.probability < 0) { // Slice not found
return HandleFallback("slice_missing", signal, reason);
}
// Check confidence threshold
if(slice.probability < policyData.min_confidence) {
reason = StringFormat("confidence %.2f < %.2f",
slice.probability, policyData.min_confidence);
return false;
}
// Apply policy scaling
ApplyPolicyScaling(signal, slice);
return true;
}
```
---
## Policy Fallback Modes
### Fallback: No Policy
**Triggers when**:
- `UsePolicyGating=true`
- Policy file not loaded OR `policy.slices == 0`
- `FallbackWhenNoPolicy=true`
**Behavior**:
- Check `FallbackDemoOnly`:
- If `true` and demo account → ALLOW with neutral scaling
- If `true` and live account → BLOCK
- If `false` → ALLOW with neutral scaling (any account)
- Bypasses insights thresholds
- Bypasses exploration caps
- No SL/TP/lot/trail multipliers applied
**Log Example**:
```
FALLBACK: no policy loaded -> neutral scaling used for ADXStrategy on EURUSD/H1 demo=true
```
**Telemetry**:
```csv
policy_fallback,EURUSD,60,ADXStrategy,no_policy,true,true
```
### Fallback: Slice Missing
**Triggers when**:
- `UsePolicyGating=true`
- Policy IS loaded (`policy.slices > 0`)
- Exact strategy|symbol|timeframe slice not found in policy
- `FallbackWhenSliceMissing=true`
**Behavior**:
- Same as "No Policy" fallback
- Allows trade with neutral scaling
- Demo-only restriction if `FallbackDemoOnly=true`
**Log Example**:
```
FALLBACK: policy slice missing -> neutral scaling used for BollAverages on GBPUSD/H1 demo=true
```
### Neutral Scaling
**Definition**: No adjustments applied, use base parameters.
```cpp
sl_mult = 1.0
tp_mult = 1.0
lot_mult = 1.0
trail_mult = 1.0
```
Trade proceeds with:
- Original SL/TP from strategy
- Original lot size from risk calculation
- Original trailing stop settings (if enabled)
### Fallback Safety
**Why demo-only by default?**
- Fallback is permissive (allows trades without ML validation)
- Intended for data collection, not production live trading
- In live, you want explicit policy coverage for all slices
**When to disable FallbackDemoOnly?**
- After verifying policy coverage is comprehensive
- When transitioning from demo to live with same strategy set
- With explicit risk acceptance of trading without ML guidance
**Best Practice**:
- Keep `FallbackDemoOnly=true` until policy proven
- Monitor `policy_fallback` telemetry events
- Aim to eliminate fallbacks by improving policy coverage
---
## Policy Scaling
2026-02-05 01:22:42 -05:00
### Policy Scaling (LiveEA Only)
2025-10-03 01:38:19 -04:00
2026-02-05 01:22:42 -05:00
> **Note:** Full policy scaling with per-slice multipliers is implemented in LiveEA. PaperEA_v2 has minimal policy parsing (checks `min_confidence` only).
**Scaling Application (LiveEA):**
2025-10-03 01:38:19 -04:00
```cpp
void ApplyPolicyScaling(SignalData &signal, PolicySlice &slice) {
// SL scaling
double slDistance = MathAbs(signal.entry_price - signal.stop_loss);
if(signal.direction == 1) { // Buy
2026-02-05 01:22:42 -05:00
signal.stop_loss = signal.entry_price - (slDistance * slice.sl_scale);
2025-10-03 01:38:19 -04:00
} else { // Sell
2026-02-05 01:22:42 -05:00
signal.stop_loss = signal.entry_price + (slDistance * slice.sl_scale);
2025-10-03 01:38:19 -04:00
}
// TP scaling
double tpDistance = MathAbs(signal.take_profit - signal.entry_price);
if(signal.direction == 1) { // Buy
2026-02-05 01:22:42 -05:00
signal.take_profit = signal.entry_price + (tpDistance * slice.tp_scale);
2025-10-03 01:38:19 -04:00
} else { // Sell
2026-02-05 01:22:42 -05:00
signal.take_profit = signal.entry_price - (tpDistance * slice.tp_scale);
2025-10-03 01:38:19 -04:00
}
// Trailing scaling (if enabled)
2026-02-05 01:22:42 -05:00
if(TrailEnabled && slice.trail_atr_mult > 0) {
// Apply trail ATR multiplier
2025-10-03 01:38:19 -04:00
}
}
```
### Scaling Examples
**Conservative Scaling** (low confidence):
```json
{
"probability": 0.60,
"sl_mult": 0.8, // Tighter stop
"tp_mult": 1.5, // Wider target (better RR)
"lot_mult": 0.5, // Smaller position
"trail_mult": 0.9 // Tighter trail
}
```
**Aggressive Scaling** (high confidence):
```json
{
"probability": 0.85,
"sl_mult": 1.2, // Wider stop (more breathing room)
"tp_mult": 0.8, // Closer target (take profits faster)
"lot_mult": 1.5, // Larger position
"trail_mult": 1.0 // Normal trail
}
```
**Neutral Scaling** (fallback):
```json
{
"probability": 0.0,
"sl_mult": 1.0,
"tp_mult": 1.0,
"lot_mult": 1.0,
"trail_mult": 1.0
}
```
---
## Exploration Mode
### Purpose
**Bootstrap insights** for strategy|symbol|timeframe slices that lack sufficient historical data.
Without exploration:
- New slices blocked by insights gating (no data → can't pass thresholds)
- Chicken-and-egg problem: can't trade → can't collect data → can't build insights
With exploration:
- Limited trades allowed for no-data slices
- Caps prevent excessive exposure
- Data collected → insights built → insights gating takes over
### When Exploration Triggers
**Conditions**:
1. `UseExploration=true`
2. Insights gating is enabled (`UseInsightsGating=true`)
3. No slice exists in insights.json for this strategy|symbol|timeframe
4. Exploration caps not exceeded
**Important**: Exploration bypass ONLY when slice truly missing. If slice exists but fails thresholds (low win rate), it is BLOCKED (no bypass).
### Configuration
```cpp
2026-02-05 01:22:42 -05:00
input bool ExploreOnNoSlice = true; // Enable exploration when no slice exists
input int ExploreMaxPerSlicePerDay = 100; // Daily cap per slice (default 100)
input int ExploreMaxPerSlice = 100; // Weekly cap per slice (default 100)
2025-10-03 01:38:19 -04:00
```
2026-02-05 01:22:42 -05:00
> **Note:** Previous documentation listed defaults of 2/3. Actual code defaults are 100/100, effectively unlimited for most practical purposes. The `UseExploration` input does not exist; exploration is controlled via `ExploreOnNoSlice`.
2025-10-03 01:38:19 -04:00
---
## Exploration Caps
### Cap Types
**Daily Cap**: `ExploreMaxPerSlicePerDay`
- Resets at midnight (00:00 server time)
- Per-slice basis (each strategy|symbol|TF tracked separately)
2026-02-05 01:22:42 -05:00
- Default: 100 trades/day/slice
2025-10-03 01:38:19 -04:00
**Weekly Cap**: `ExploreMaxPerSlice`
2026-02-05 01:22:42 -05:00
- Resets on Monday
2025-10-03 01:38:19 -04:00
- Week bucket = Monday of the week (yyyymmdd format)
2026-02-05 01:22:42 -05:00
- Default: 100 trades/week/slice
2025-10-03 01:38:19 -04:00
### Counter Persistence
**Files**:
- `Common/Files/DualEA/explore_counts_day.csv`
- `Common/Files/DualEA/explore_counts.csv`
**Format**:
```csv
key,date_yyyymmdd,count
ADXStrategy|EURUSD|60,20250115,2
BollAverages|GBPUSD|60,20250115,1
```
For weekly:
```csv
key,week_monday_yyyymmdd,count
ADXStrategy|EURUSD|60,20250113,3
```
### Cap Checking
```cpp
bool CheckExplorationCaps(string strategy, string symbol, int tf, string &reason) {
string sliceKey = strategy + "|" + symbol + "|" + IntegerToString(tf);
// Load counters
int dayCount = LoadExploreCountDay(sliceKey);
int weekCount = LoadExploreCountWeek(sliceKey);
// Check daily cap
if(ExploreMaxPerSlicePerDay > 0 && dayCount >= ExploreMaxPerSlicePerDay) {
reason = StringFormat("explore_cap_day (day=%d/%d, week=%d/%d)",
dayCount, ExploreMaxPerSlicePerDay,
weekCount, ExploreMaxPerSlice);
telemetry.Event("explore_block_day", sliceKey, dayCount, weekCount);
return false;
}
// Check weekly cap
if(ExploreMaxPerSlice > 0 && weekCount >= ExploreMaxPerSlice) {
reason = StringFormat("explore_cap_week (day=%d/%d, week=%d/%d)",
dayCount, ExploreMaxPerSlicePerDay,
weekCount, ExploreMaxPerSlice);
telemetry.Event("explore_block_week", sliceKey, dayCount, weekCount);
return false;
}
// Increment counters
IncrementExploreCountDay(sliceKey);
IncrementExploreCountWeek(sliceKey);
// Allow exploration
Log(StringFormat("GATE: explore allow %s on %s/%d (day=%d/%d, week=%d/%d)",
strategy, symbol, tf,
dayCount+1, ExploreMaxPerSlicePerDay,
weekCount+1, ExploreMaxPerSlice));
telemetry.Event("explore_allow", sliceKey, dayCount+1, weekCount+1);
return true;
}
```
### Resetting Caps
**Manual Reset**:
Delete counter files:
```powershell
Remove-Item "C:\Users\<you>\AppData\Roaming\MetaQuotes\Terminal\Common\Files\DualEA\explore_counts*.csv"
```
**Automatic Reset**:
- Daily: Midnight (00:00 server time)
2026-02-05 01:22:42 -05:00
- Weekly: Monday 00:00
2025-10-03 01:38:19 -04:00
### Interaction with NoConstraintsMode
When `NoConstraintsMode=true`:
- **Insights gating**: BYPASSED
2026-02-05 01:22:42 -05:00
- **Exploration caps**: BYPASSED (trades allowed regardless of caps)
- **Exploration counters**: Still incremented for telemetry
2025-10-03 01:38:19 -04:00
---
## Troubleshooting
### Issue: Policy fallback always triggering
**Symptoms**:
```
FALLBACK: no policy loaded -> neutral scaling for ADXStrategy on EURUSD/H1 demo=true
```
**Diagnosis**:
1. Check if `policy.json` exists:
```powershell
Test-Path "C:\Users\<you>\AppData\Roaming\MetaQuotes\Terminal\Common\Files\DualEA\policy.json"
```
2. Check policy load logs (enable `DebugPolicy=true`):
```
Policy loaded: 0 slices, min_confidence=0.00
```
3. Verify policy.json content:
```powershell
Get-Content "...\DualEA\policy.json" | ConvertFrom-Json | Select-Object -ExpandProperty slices | Measure-Object
```
**Solutions**:
- Run `ML/policy_export.py` to generate policy
- Copy policy.json to Common Files
- Verify JSON is well-formed (no parse errors)
### Issue: Policy slice missing fallback
**Symptoms**:
```
FALLBACK: policy slice missing -> neutral scaling for BollAverages on GBPUSD/H1 demo=true
```
**Diagnosis**:
1. Check policy slices:
```powershell
Get-Content "...\policy.json" | ConvertFrom-Json | Select-Object -ExpandProperty slices |
Where-Object {$_.strategy -eq "BollAverages" -and $_.symbol -eq "GBPUSD" -and $_.timeframe -eq 60}
```
2. No result → slice missing from policy
**Solutions**:
- Collect more training data for this slice
- Re-train model with broader coverage
- Accept fallback for new slices (data collection mode)
### Issue: Exploration caps reached immediately
**Symptoms**:
```
GATE: blocked ADXStrategy on EURUSD/H1 reason=explore_cap_day (day=2/2, week=2/3)
```
**Diagnosis**:
1. Check counters:
```powershell
Import-Csv "...\explore_counts_day.csv" | Where-Object {$_.key -like "*ADXStrategy|EURUSD|60*"}
```
2. Verify caps not too restrictive:
```cpp
ExploreMaxPerSlicePerDay = 2 // Very conservative
```
**Solutions**:
- Increase caps (e.g., 5/10 for PaperEA, 1/2 for LiveEA)
- Reset counters manually (delete CSV files)
- Use `NoConstraintsMode=true` for initial data collection (bypasses all caps)
### Issue: Exploration not triggering
**Symptoms**:
```
GATE: blocked ADXStrategy on EURUSD/H1 reason=below_winrate (WR=0.42 < 0.50)
```
(Slice exists but fails threshold, no exploration bypass)
**Diagnosis**:
Exploration only triggers when **slice missing entirely**, not when slice exists but fails thresholds.
**Solutions**:
- This is correct behavior (no-slice-only bypass)
- To allow trading despite low performance:
- Lower insights thresholds temporarily
- Use `NoConstraintsMode=true`
- Delete slice from insights.json to force exploration
- Wait for performance to improve
### Issue: NoConstraintsMode not working
**Symptoms**:
Still seeing gate blocks despite `NoConstraintsMode=true`
**Diagnosis**:
1. Verify setting applied:
```cpp
if(NoConstraintsMode) Print("NoConstraintsMode is TRUE");
```
2. Check if using unified system:
```cpp
input bool UseUnifiedSystem = true; // Required for NoConstraintsMode
```
**Solutions**:
- Ensure `UseUnifiedSystem=true`
- Restart EA after changing `NoConstraintsMode`
- Check logs for shadow decisions: `[SHADOW] gate=... result=block (but allowing)`
---
**See Also:**
- [Configuration-Reference.md](Configuration-Reference.md) - Policy and exploration parameters
- [Execution-Pipeline.md](Execution-Pipeline.md) - When policy/exploration evaluated
- [Observability-Guide.md](Observability-Guide.md) - Policy/exploration telemetry events