mql5/Experts/Advisors/DualEA/docs/Policy-Exploration-Guide.md

Name: MQL5 Algo Forge
Brand: MQL5
# Policy & Exploration Guide — DualEA System

**ML policy gating, fallback modes, and exploration system**

---

## Table of Contents

1. [Policy System Overview](#policy-system-overview)
2. [Policy Structure](#policy-structure)
3. [Policy Gating Logic](#policy-gating-logic)
4. [Policy Fallback Modes](#policy-fallback-modes)
5. [Policy Scaling](#policy-scaling)
6. [Exploration Mode](#exploration-mode)
7. [Exploration Caps](#exploration-caps)
8. [Troubleshooting](#troubleshooting)

---

## Policy System Overview

### Purpose

The **policy system** allows ML models to influence trading decisions through:
- **Confidence gating**: Block low-confidence predictions
- **Parameter scaling**: Adjust SL/TP/lots based on model output
- **Risk management**: Dynamic position sizing based on predicted outcomes

### Files

**policy.json**: ML-generated trading policy
- Path: `Common/Files/DualEA/policy.json`
- Generated by: `ML/policy_export.py`
- Structure: Per-slice (strategy|symbol|timeframe) probabilities and scaling multipliers

**policy.backup.json**: Last-known-good policy
- Used for rollback on parse/plausibility failures
- Updated on successful policy load

**policy.reload**: Trigger file for hot-reload
- PaperEA_v2: Hot-reload supported via `policy.reload` + HTTP polling
- LiveEA: Load on init only (restart required for updates)

---

## Policy Structure

### JSON Schema

```json
{
  "version": "1.0",
  "generated_at": "2025-01-15T10:00:00Z",
  "model_hash": "abc123...",
  "train_window": {"start": "2024-01-01", "end": "2025-01-15"},
  "metrics": {
    "roc_auc": 0.72,
    "brier_score": 0.18,
    "expected_r": 0.65
  },
  "min_confidence": 0.55,
  "slices": [
    {
      "strategy": "ADXStrategy",
      "symbol": "EURUSD",
      "timeframe": 60,
      "probability": 0.75,
      "sl_mult": 1.0,
      "tp_mult": 1.2,
      "lot_mult": 1.0,
      "trail_mult": 1.0
    },
    ...
  ]
}
```

### Fields

**Global**:
- `version`: Schema version
- `generated_at`: Timestamp of policy generation
- `model_hash`: Trained model identifier for provenance
- `train_window`: Data range used for training
- `metrics`: Model performance metrics
- `min_confidence`: Global minimum confidence threshold

**Per-Slice** (as implemented in LiveEA; PaperEA_v2 uses minimal parsing):
- `strategy`: Strategy name (e.g., "ADXStrategy")
- `symbol`: Trading symbol (e.g., "EURUSD")
- `timeframe`: Timeframe in minutes (e.g., 60 for H1)
- `p_win`: ML model confidence (0.0-1.0)
- `sl_scale`: Stop loss scaling multiplier (optional)
- `tp_scale`: Take profit scaling multiplier (optional)
- `trail_atr_mult`: Trailing stop ATR multiplier (optional)
- `confidence`: Alternative confidence field (optional)

---

## Policy Gating Logic

### Configuration

```cpp
input bool     UsePolicyGating = true;           // Enable ML policy gating
input bool     DefaultPolicyFallback = true;     // Enable fallback modes
input bool     FallbackDemoOnly = true;          // Restrict fallback to demo
input bool     FallbackWhenNoPolicy = true;      // Fallback if policy not loaded
input bool     FallbackWhenSliceMissing = true;  // Fallback if slice missing
```

### Decision Flow

```
Is UsePolicyGating=true?
  No  → PASS (no policy gating)
  Yes → Continue

Is policy loaded (slices > 0)?
  No  → Check FallbackWhenNoPolicy
         Yes → Check FallbackDemoOnly
                Demo account → FALLBACK (neutral scaling)
                Live account → BLOCK
         No  → BLOCK (reason: "no policy loaded")
  Yes → Continue

Does exact slice exist (strategy|symbol|TF)?
  No  → Check FallbackWhenSliceMissing
         Yes → Check FallbackDemoOnly
                Demo account → FALLBACK (neutral scaling)
                Live account → BLOCK
         No  → BLOCK (reason: "policy slice missing")
  Yes → Continue

Is slice.probability >= policy.min_confidence?
  No  → BLOCK (reason: "confidence too low")
  Yes → PASS + Apply scaling
```

### Code Example

```cpp
bool ApplyPolicyGating(SignalData &signal, string &reason) {
    if(!UsePolicyGating) return true;

    // Check if policy loaded
    if(policyData.slices == 0) {
        return HandleFallback("no_policy", signal, reason);
    }

    // Find policy slice
    PolicySlice slice = policyData.Find(
        signal.strategy_name,
        symbol,
        timeframe
    );

    if(slice.probability < 0) {  // Slice not found
        return HandleFallback("slice_missing", signal, reason);
    }

    // Check confidence threshold
    if(slice.probability < policyData.min_confidence) {
        reason = StringFormat("confidence %.2f < %.2f",
                             slice.probability, policyData.min_confidence);
        return false;
    }

    // Apply policy scaling
    ApplyPolicyScaling(signal, slice);

    return true;
}
```

---

## Policy Fallback Modes

### Fallback: No Policy

**Triggers when**:
- `UsePolicyGating=true`
- Policy file not loaded OR `policy.slices == 0`
- `FallbackWhenNoPolicy=true`

**Behavior**:
- Check `FallbackDemoOnly`:
  - If `true` and demo account → ALLOW with neutral scaling
  - If `true` and live account → BLOCK
  - If `false` → ALLOW with neutral scaling (any account)
- Bypasses insights thresholds
- Bypasses exploration caps
- No SL/TP/lot/trail multipliers applied

**Log Example**:
```
FALLBACK: no policy loaded -> neutral scaling used for ADXStrategy on EURUSD/H1 demo=true
```

**Telemetry**:
```csv
policy_fallback,EURUSD,60,ADXStrategy,no_policy,true,true
```

### Fallback: Slice Missing

**Triggers when**:
- `UsePolicyGating=true`
- Policy IS loaded (`policy.slices > 0`)
- Exact strategy|symbol|timeframe slice not found in policy
- `FallbackWhenSliceMissing=true`

**Behavior**:
- Same as "No Policy" fallback
- Allows trade with neutral scaling
- Demo-only restriction if `FallbackDemoOnly=true`

**Log Example**:
```
FALLBACK: policy slice missing -> neutral scaling used for BollAverages on GBPUSD/H1 demo=true
```

### Neutral Scaling

**Definition**: No adjustments applied, use base parameters.

```cpp
sl_mult = 1.0
tp_mult = 1.0
lot_mult = 1.0
trail_mult = 1.0
```

Trade proceeds with:
- Original SL/TP from strategy
- Original lot size from risk calculation
- Original trailing stop settings (if enabled)

### Fallback Safety

**Why demo-only by default?**
- Fallback is permissive (allows trades without ML validation)
- Intended for data collection, not production live trading
- In live, you want explicit policy coverage for all slices

**When to disable FallbackDemoOnly?**
- After verifying policy coverage is comprehensive
- When transitioning from demo to live with same strategy set
- With explicit risk acceptance of trading without ML guidance

**Best Practice**:
- Keep `FallbackDemoOnly=true` until policy proven
- Monitor `policy_fallback` telemetry events
- Aim to eliminate fallbacks by improving policy coverage

---

## Policy Scaling

### Policy Scaling (LiveEA Only)

> **Note:** Full policy scaling with per-slice multipliers is implemented in LiveEA. PaperEA_v2 has minimal policy parsing (checks `min_confidence` only).

**Scaling Application (LiveEA):**
```cpp
void ApplyPolicyScaling(SignalData &signal, PolicySlice &slice) {
    // SL scaling
    double slDistance = MathAbs(signal.entry_price - signal.stop_loss);
    if(signal.direction == 1) {  // Buy
        signal.stop_loss = signal.entry_price - (slDistance * slice.sl_scale);
    } else {  // Sell
        signal.stop_loss = signal.entry_price + (slDistance * slice.sl_scale);
    }

    // TP scaling
    double tpDistance = MathAbs(signal.take_profit - signal.entry_price);
    if(signal.direction == 1) {  // Buy
        signal.take_profit = signal.entry_price + (tpDistance * slice.tp_scale);
    } else {  // Sell
        signal.take_profit = signal.entry_price - (tpDistance * slice.tp_scale);
    }

    // Trailing scaling (if enabled)
    if(TrailEnabled && slice.trail_atr_mult > 0) {
        // Apply trail ATR multiplier
    }
}
```

### Scaling Examples

**Conservative Scaling** (low confidence):
```json
{
  "probability": 0.60,
  "sl_mult": 0.8,   // Tighter stop
  "tp_mult": 1.5,   // Wider target (better RR)
  "lot_mult": 0.5,  // Smaller position
  "trail_mult": 0.9 // Tighter trail
}
```

**Aggressive Scaling** (high confidence):
```json
{
  "probability": 0.85,
  "sl_mult": 1.2,   // Wider stop (more breathing room)
  "tp_mult": 0.8,   // Closer target (take profits faster)
  "lot_mult": 1.5,  // Larger position
  "trail_mult": 1.0 // Normal trail
}
```

**Neutral Scaling** (fallback):
```json
{
  "probability": 0.0,
  "sl_mult": 1.0,
  "tp_mult": 1.0,
  "lot_mult": 1.0,
  "trail_mult": 1.0
}
```

---

## Exploration Mode

### Purpose

**Bootstrap insights** for strategy|symbol|timeframe slices that lack sufficient historical data.

Without exploration:
- New slices blocked by insights gating (no data → can't pass thresholds)
- Chicken-and-egg problem: can't trade → can't collect data → can't build insights

With exploration:
- Limited trades allowed for no-data slices
- Caps prevent excessive exposure
- Data collected → insights built → insights gating takes over

### When Exploration Triggers

**Conditions**:
1. `UseExploration=true`
2. Insights gating is enabled (`UseInsightsGating=true`)
3. No slice exists in insights.json for this strategy|symbol|timeframe
4. Exploration caps not exceeded

**Important**: Exploration bypass ONLY when slice truly missing. If slice exists but fails thresholds (low win rate), it is BLOCKED (no bypass).

### Configuration

```cpp
input bool     ExploreOnNoSlice = true;          // Enable exploration when no slice exists
input int      ExploreMaxPerSlicePerDay = 100;   // Daily cap per slice (default 100)
input int      ExploreMaxPerSlice = 100;         // Weekly cap per slice (default 100)
```

> **Note:** Previous documentation listed defaults of 2/3. Actual code defaults are 100/100, effectively unlimited for most practical purposes. The `UseExploration` input does not exist; exploration is controlled via `ExploreOnNoSlice`.

---

## Exploration Caps

### Cap Types

**Daily Cap**: `ExploreMaxPerSlicePerDay`
- Resets at midnight (00:00 server time)
- Per-slice basis (each strategy|symbol|TF tracked separately)
- Default: 100 trades/day/slice

**Weekly Cap**: `ExploreMaxPerSlice`
- Resets on Monday
- Week bucket = Monday of the week (yyyymmdd format)
- Default: 100 trades/week/slice

### Counter Persistence

**Files**:
- `Common/Files/DualEA/explore_counts_day.csv`
- `Common/Files/DualEA/explore_counts.csv`

**Format**:
```csv
key,date_yyyymmdd,count
ADXStrategy|EURUSD|60,20250115,2
BollAverages|GBPUSD|60,20250115,1
```

For weekly:
```csv
key,week_monday_yyyymmdd,count
ADXStrategy|EURUSD|60,20250113,3
```

### Cap Checking

```cpp
bool CheckExplorationCaps(string strategy, string symbol, int tf, string &reason) {
    string sliceKey = strategy + "|" + symbol + "|" + IntegerToString(tf);

    // Load counters
    int dayCount = LoadExploreCountDay(sliceKey);
    int weekCount = LoadExploreCountWeek(sliceKey);

    // Check daily cap
    if(ExploreMaxPerSlicePerDay > 0 && dayCount >= ExploreMaxPerSlicePerDay) {
        reason = StringFormat("explore_cap_day (day=%d/%d, week=%d/%d)",
                             dayCount, ExploreMaxPerSlicePerDay,
                             weekCount, ExploreMaxPerSlice);
        telemetry.Event("explore_block_day", sliceKey, dayCount, weekCount);
        return false;
    }

    // Check weekly cap
    if(ExploreMaxPerSlice > 0 && weekCount >= ExploreMaxPerSlice) {
        reason = StringFormat("explore_cap_week (day=%d/%d, week=%d/%d)",
                             dayCount, ExploreMaxPerSlicePerDay,
                             weekCount, ExploreMaxPerSlice);
        telemetry.Event("explore_block_week", sliceKey, dayCount, weekCount);
        return false;
    }

    // Increment counters
    IncrementExploreCountDay(sliceKey);
    IncrementExploreCountWeek(sliceKey);

    // Allow exploration
    Log(StringFormat("GATE: explore allow %s on %s/%d (day=%d/%d, week=%d/%d)",
        strategy, symbol, tf,
        dayCount+1, ExploreMaxPerSlicePerDay,
        weekCount+1, ExploreMaxPerSlice));

    telemetry.Event("explore_allow", sliceKey, dayCount+1, weekCount+1);

    return true;
}
```

### Resetting Caps

**Manual Reset**:
Delete counter files:
```powershell
Remove-Item "C:\Users\<you>\AppData\Roaming\MetaQuotes\Terminal\Common\Files\DualEA\explore_counts*.csv"
```

**Automatic Reset**:
- Daily: Midnight (00:00 server time)
- Weekly: Monday 00:00

### Interaction with NoConstraintsMode

When `NoConstraintsMode=true`:
- **Insights gating**: BYPASSED
- **Exploration caps**: BYPASSED (trades allowed regardless of caps)
- **Exploration counters**: Still incremented for telemetry

---

## Troubleshooting

### Issue: Policy fallback always triggering

**Symptoms**:
```
FALLBACK: no policy loaded -> neutral scaling for ADXStrategy on EURUSD/H1 demo=true
```

**Diagnosis**:
1. Check if `policy.json` exists:
   ```powershell
   Test-Path "C:\Users\<you>\AppData\Roaming\MetaQuotes\Terminal\Common\Files\DualEA\policy.json"
   ```

2. Check policy load logs (enable `DebugPolicy=true`):
   ```
   Policy loaded: 0 slices, min_confidence=0.00
   ```

3. Verify policy.json content:
   ```powershell
   Get-Content "...\DualEA\policy.json" | ConvertFrom-Json | Select-Object -ExpandProperty slices | Measure-Object
   ```

**Solutions**:
- Run `ML/policy_export.py` to generate policy
- Copy policy.json to Common Files
- Verify JSON is well-formed (no parse errors)

### Issue: Policy slice missing fallback

**Symptoms**:
```
FALLBACK: policy slice missing -> neutral scaling for BollAverages on GBPUSD/H1 demo=true
```

**Diagnosis**:
1. Check policy slices:
   ```powershell
   Get-Content "...\policy.json" | ConvertFrom-Json | Select-Object -ExpandProperty slices |
       Where-Object {$_.strategy -eq "BollAverages" -and $_.symbol -eq "GBPUSD" -and $_.timeframe -eq 60}
   ```

2. No result → slice missing from policy

**Solutions**:
- Collect more training data for this slice
- Re-train model with broader coverage
- Accept fallback for new slices (data collection mode)

### Issue: Exploration caps reached immediately

**Symptoms**:
```
GATE: blocked ADXStrategy on EURUSD/H1 reason=explore_cap_day (day=2/2, week=2/3)
```

**Diagnosis**:
1. Check counters:
   ```powershell
   Import-Csv "...\explore_counts_day.csv" | Where-Object {$_.key -like "*ADXStrategy|EURUSD|60*"}
   ```

2. Verify caps not too restrictive:
   ```cpp
   ExploreMaxPerSlicePerDay = 2  // Very conservative
   ```

**Solutions**:
- Increase caps (e.g., 5/10 for PaperEA, 1/2 for LiveEA)
- Reset counters manually (delete CSV files)
- Use `NoConstraintsMode=true` for initial data collection (bypasses all caps)

### Issue: Exploration not triggering

**Symptoms**:
```
GATE: blocked ADXStrategy on EURUSD/H1 reason=below_winrate (WR=0.42 < 0.50)
```
(Slice exists but fails threshold, no exploration bypass)

**Diagnosis**:
Exploration only triggers when **slice missing entirely**, not when slice exists but fails thresholds.

**Solutions**:
- This is correct behavior (no-slice-only bypass)
- To allow trading despite low performance:
  - Lower insights thresholds temporarily
  - Use `NoConstraintsMode=true`
  - Delete slice from insights.json to force exploration
  - Wait for performance to improve

### Issue: NoConstraintsMode not working

**Symptoms**:
Still seeing gate blocks despite `NoConstraintsMode=true`

**Diagnosis**:
1. Verify setting applied:
   ```cpp
   if(NoConstraintsMode) Print("NoConstraintsMode is TRUE");
   ```

2. Check if using unified system:
   ```cpp
   input bool UseUnifiedSystem = true;  // Required for NoConstraintsMode
   ```

**Solutions**:
- Ensure `UseUnifiedSystem=true`
- Restart EA after changing `NoConstraintsMode`
- Check logs for shadow decisions: `[SHADOW] gate=... result=block (but allowing)`

---

**See Also:**
- [Configuration-Reference.md](Configuration-Reference.md) - Policy and exploration parameters
- [Execution-Pipeline.md](Execution-Pipeline.md) - When policy/exploration evaluated
- [Observability-Guide.md](Observability-Guide.md) - Policy/exploration telemetry events