mql5/Experts/Advisors/DualEA/docs/Policy-Exploration-Guide.md
Princeec13 0b557e494f
2026-02-05 01:22:42 -05:00

16 KiB

Policy & Exploration Guide — DualEA System

ML policy gating, fallback modes, and exploration system


Table of Contents

  1. Policy System Overview
  2. Policy Structure
  3. Policy Gating Logic
  4. Policy Fallback Modes
  5. Policy Scaling
  6. Exploration Mode
  7. Exploration Caps
  8. Troubleshooting

Policy System Overview

Purpose

The policy system allows ML models to influence trading decisions through:

  • Confidence gating: Block low-confidence predictions
  • Parameter scaling: Adjust SL/TP/lots based on model output
  • Risk management: Dynamic position sizing based on predicted outcomes

Files

policy.json: ML-generated trading policy

  • Path: Common/Files/DualEA/policy.json
  • Generated by: ML/policy_export.py
  • Structure: Per-slice (strategy|symbol|timeframe) probabilities and scaling multipliers

policy.backup.json: Last-known-good policy

  • Used for rollback on parse/plausibility failures
  • Updated on successful policy load

policy.reload: Trigger file for hot-reload

  • PaperEA_v2: Hot-reload supported via policy.reload + HTTP polling
  • LiveEA: Load on init only (restart required for updates)

Policy Structure

JSON Schema

{
  "version": "1.0",
  "generated_at": "2025-01-15T10:00:00Z",
  "model_hash": "abc123...",
  "train_window": {"start": "2024-01-01", "end": "2025-01-15"},
  "metrics": {
    "roc_auc": 0.72,
    "brier_score": 0.18,
    "expected_r": 0.65
  },
  "min_confidence": 0.55,
  "slices": [
    {
      "strategy": "ADXStrategy",
      "symbol": "EURUSD",
      "timeframe": 60,
      "probability": 0.75,
      "sl_mult": 1.0,
      "tp_mult": 1.2,
      "lot_mult": 1.0,
      "trail_mult": 1.0
    },
    ...
  ]
}

Fields

Global:

  • version: Schema version
  • generated_at: Timestamp of policy generation
  • model_hash: Trained model identifier for provenance
  • train_window: Data range used for training
  • metrics: Model performance metrics
  • min_confidence: Global minimum confidence threshold

Per-Slice (as implemented in LiveEA; PaperEA_v2 uses minimal parsing):

  • strategy: Strategy name (e.g., "ADXStrategy")
  • symbol: Trading symbol (e.g., "EURUSD")
  • timeframe: Timeframe in minutes (e.g., 60 for H1)
  • p_win: ML model confidence (0.0-1.0)
  • sl_scale: Stop loss scaling multiplier (optional)
  • tp_scale: Take profit scaling multiplier (optional)
  • trail_atr_mult: Trailing stop ATR multiplier (optional)
  • confidence: Alternative confidence field (optional)

Policy Gating Logic

Configuration

input bool     UsePolicyGating = true;           // Enable ML policy gating
input bool     DefaultPolicyFallback = true;     // Enable fallback modes
input bool     FallbackDemoOnly = true;          // Restrict fallback to demo
input bool     FallbackWhenNoPolicy = true;      // Fallback if policy not loaded
input bool     FallbackWhenSliceMissing = true;  // Fallback if slice missing

Decision Flow

Is UsePolicyGating=true?
  No  → PASS (no policy gating)
  Yes → Continue

Is policy loaded (slices > 0)?
  No  → Check FallbackWhenNoPolicy
         Yes → Check FallbackDemoOnly
                Demo account → FALLBACK (neutral scaling)
                Live account → BLOCK
         No  → BLOCK (reason: "no policy loaded")
  Yes → Continue

Does exact slice exist (strategy|symbol|TF)?
  No  → Check FallbackWhenSliceMissing
         Yes → Check FallbackDemoOnly
                Demo account → FALLBACK (neutral scaling)
                Live account → BLOCK
         No  → BLOCK (reason: "policy slice missing")
  Yes → Continue

Is slice.probability >= policy.min_confidence?
  No  → BLOCK (reason: "confidence too low")
  Yes → PASS + Apply scaling

Code Example

bool ApplyPolicyGating(SignalData &signal, string &reason) {
    if(!UsePolicyGating) return true;
    
    // Check if policy loaded
    if(policyData.slices == 0) {
        return HandleFallback("no_policy", signal, reason);
    }
    
    // Find policy slice
    PolicySlice slice = policyData.Find(
        signal.strategy_name, 
        symbol, 
        timeframe
    );
    
    if(slice.probability < 0) {  // Slice not found
        return HandleFallback("slice_missing", signal, reason);
    }
    
    // Check confidence threshold
    if(slice.probability < policyData.min_confidence) {
        reason = StringFormat("confidence %.2f < %.2f",
                             slice.probability, policyData.min_confidence);
        return false;
    }
    
    // Apply policy scaling
    ApplyPolicyScaling(signal, slice);
    
    return true;
}

Policy Fallback Modes

Fallback: No Policy

Triggers when:

  • UsePolicyGating=true
  • Policy file not loaded OR policy.slices == 0
  • FallbackWhenNoPolicy=true

Behavior:

  • Check FallbackDemoOnly:
    • If true and demo account → ALLOW with neutral scaling
    • If true and live account → BLOCK
    • If false → ALLOW with neutral scaling (any account)
  • Bypasses insights thresholds
  • Bypasses exploration caps
  • No SL/TP/lot/trail multipliers applied

Log Example:

FALLBACK: no policy loaded -> neutral scaling used for ADXStrategy on EURUSD/H1 demo=true

Telemetry:

policy_fallback,EURUSD,60,ADXStrategy,no_policy,true,true

Fallback: Slice Missing

Triggers when:

  • UsePolicyGating=true
  • Policy IS loaded (policy.slices > 0)
  • Exact strategy|symbol|timeframe slice not found in policy
  • FallbackWhenSliceMissing=true

Behavior:

  • Same as "No Policy" fallback
  • Allows trade with neutral scaling
  • Demo-only restriction if FallbackDemoOnly=true

Log Example:

FALLBACK: policy slice missing -> neutral scaling used for BollAverages on GBPUSD/H1 demo=true

Neutral Scaling

Definition: No adjustments applied, use base parameters.

sl_mult = 1.0
tp_mult = 1.0
lot_mult = 1.0
trail_mult = 1.0

Trade proceeds with:

  • Original SL/TP from strategy
  • Original lot size from risk calculation
  • Original trailing stop settings (if enabled)

Fallback Safety

Why demo-only by default?

  • Fallback is permissive (allows trades without ML validation)
  • Intended for data collection, not production live trading
  • In live, you want explicit policy coverage for all slices

When to disable FallbackDemoOnly?

  • After verifying policy coverage is comprehensive
  • When transitioning from demo to live with same strategy set
  • With explicit risk acceptance of trading without ML guidance

Best Practice:

  • Keep FallbackDemoOnly=true until policy proven
  • Monitor policy_fallback telemetry events
  • Aim to eliminate fallbacks by improving policy coverage

Policy Scaling

Policy Scaling (LiveEA Only)

Note: Full policy scaling with per-slice multipliers is implemented in LiveEA. PaperEA_v2 has minimal policy parsing (checks min_confidence only).

Scaling Application (LiveEA):

void ApplyPolicyScaling(SignalData &signal, PolicySlice &slice) {
    // SL scaling
    double slDistance = MathAbs(signal.entry_price - signal.stop_loss);
    if(signal.direction == 1) {  // Buy
        signal.stop_loss = signal.entry_price - (slDistance * slice.sl_scale);
    } else {  // Sell
        signal.stop_loss = signal.entry_price + (slDistance * slice.sl_scale);
    }
    
    // TP scaling
    double tpDistance = MathAbs(signal.take_profit - signal.entry_price);
    if(signal.direction == 1) {  // Buy
        signal.take_profit = signal.entry_price + (tpDistance * slice.tp_scale);
    } else {  // Sell
        signal.take_profit = signal.entry_price - (tpDistance * slice.tp_scale);
    }
    
    // Trailing scaling (if enabled)
    if(TrailEnabled && slice.trail_atr_mult > 0) {
        // Apply trail ATR multiplier
    }
}

Scaling Examples

Conservative Scaling (low confidence):

{
  "probability": 0.60,
  "sl_mult": 0.8,   // Tighter stop
  "tp_mult": 1.5,   // Wider target (better RR)
  "lot_mult": 0.5,  // Smaller position
  "trail_mult": 0.9 // Tighter trail
}

Aggressive Scaling (high confidence):

{
  "probability": 0.85,
  "sl_mult": 1.2,   // Wider stop (more breathing room)
  "tp_mult": 0.8,   // Closer target (take profits faster)
  "lot_mult": 1.5,  // Larger position
  "trail_mult": 1.0 // Normal trail
}

Neutral Scaling (fallback):

{
  "probability": 0.0,
  "sl_mult": 1.0,
  "tp_mult": 1.0,
  "lot_mult": 1.0,
  "trail_mult": 1.0
}

Exploration Mode

Purpose

Bootstrap insights for strategy|symbol|timeframe slices that lack sufficient historical data.

Without exploration:

  • New slices blocked by insights gating (no data → can't pass thresholds)
  • Chicken-and-egg problem: can't trade → can't collect data → can't build insights

With exploration:

  • Limited trades allowed for no-data slices
  • Caps prevent excessive exposure
  • Data collected → insights built → insights gating takes over

When Exploration Triggers

Conditions:

  1. UseExploration=true
  2. Insights gating is enabled (UseInsightsGating=true)
  3. No slice exists in insights.json for this strategy|symbol|timeframe
  4. Exploration caps not exceeded

Important: Exploration bypass ONLY when slice truly missing. If slice exists but fails thresholds (low win rate), it is BLOCKED (no bypass).

Configuration

input bool     ExploreOnNoSlice = true;          // Enable exploration when no slice exists
input int      ExploreMaxPerSlicePerDay = 100;   // Daily cap per slice (default 100)
input int      ExploreMaxPerSlice = 100;         // Weekly cap per slice (default 100)

Note: Previous documentation listed defaults of 2/3. Actual code defaults are 100/100, effectively unlimited for most practical purposes. The UseExploration input does not exist; exploration is controlled via ExploreOnNoSlice.


Exploration Caps

Cap Types

Daily Cap: ExploreMaxPerSlicePerDay

  • Resets at midnight (00:00 server time)
  • Per-slice basis (each strategy|symbol|TF tracked separately)
  • Default: 100 trades/day/slice

Weekly Cap: ExploreMaxPerSlice

  • Resets on Monday
  • Week bucket = Monday of the week (yyyymmdd format)
  • Default: 100 trades/week/slice

Counter Persistence

Files:

  • Common/Files/DualEA/explore_counts_day.csv
  • Common/Files/DualEA/explore_counts.csv

Format:

key,date_yyyymmdd,count
ADXStrategy|EURUSD|60,20250115,2
BollAverages|GBPUSD|60,20250115,1

For weekly:

key,week_monday_yyyymmdd,count
ADXStrategy|EURUSD|60,20250113,3

Cap Checking

bool CheckExplorationCaps(string strategy, string symbol, int tf, string &reason) {
    string sliceKey = strategy + "|" + symbol + "|" + IntegerToString(tf);
    
    // Load counters
    int dayCount = LoadExploreCountDay(sliceKey);
    int weekCount = LoadExploreCountWeek(sliceKey);
    
    // Check daily cap
    if(ExploreMaxPerSlicePerDay > 0 && dayCount >= ExploreMaxPerSlicePerDay) {
        reason = StringFormat("explore_cap_day (day=%d/%d, week=%d/%d)",
                             dayCount, ExploreMaxPerSlicePerDay,
                             weekCount, ExploreMaxPerSlice);
        telemetry.Event("explore_block_day", sliceKey, dayCount, weekCount);
        return false;
    }
    
    // Check weekly cap
    if(ExploreMaxPerSlice > 0 && weekCount >= ExploreMaxPerSlice) {
        reason = StringFormat("explore_cap_week (day=%d/%d, week=%d/%d)",
                             dayCount, ExploreMaxPerSlicePerDay,
                             weekCount, ExploreMaxPerSlice);
        telemetry.Event("explore_block_week", sliceKey, dayCount, weekCount);
        return false;
    }
    
    // Increment counters
    IncrementExploreCountDay(sliceKey);
    IncrementExploreCountWeek(sliceKey);
    
    // Allow exploration
    Log(StringFormat("GATE: explore allow %s on %s/%d (day=%d/%d, week=%d/%d)",
        strategy, symbol, tf,
        dayCount+1, ExploreMaxPerSlicePerDay,
        weekCount+1, ExploreMaxPerSlice));
    
    telemetry.Event("explore_allow", sliceKey, dayCount+1, weekCount+1);
    
    return true;
}

Resetting Caps

Manual Reset: Delete counter files:

Remove-Item "C:\Users\<you>\AppData\Roaming\MetaQuotes\Terminal\Common\Files\DualEA\explore_counts*.csv"

Automatic Reset:

  • Daily: Midnight (00:00 server time)
  • Weekly: Monday 00:00

Interaction with NoConstraintsMode

When NoConstraintsMode=true:

  • Insights gating: BYPASSED
  • Exploration caps: BYPASSED (trades allowed regardless of caps)
  • Exploration counters: Still incremented for telemetry

Troubleshooting

Issue: Policy fallback always triggering

Symptoms:

FALLBACK: no policy loaded -> neutral scaling for ADXStrategy on EURUSD/H1 demo=true

Diagnosis:

  1. Check if policy.json exists:

    Test-Path "C:\Users\<you>\AppData\Roaming\MetaQuotes\Terminal\Common\Files\DualEA\policy.json"
    
  2. Check policy load logs (enable DebugPolicy=true):

    Policy loaded: 0 slices, min_confidence=0.00
    
  3. Verify policy.json content:

    Get-Content "...\DualEA\policy.json" | ConvertFrom-Json | Select-Object -ExpandProperty slices | Measure-Object
    

Solutions:

  • Run ML/policy_export.py to generate policy
  • Copy policy.json to Common Files
  • Verify JSON is well-formed (no parse errors)

Issue: Policy slice missing fallback

Symptoms:

FALLBACK: policy slice missing -> neutral scaling for BollAverages on GBPUSD/H1 demo=true

Diagnosis:

  1. Check policy slices:

    Get-Content "...\policy.json" | ConvertFrom-Json | Select-Object -ExpandProperty slices | 
        Where-Object {$_.strategy -eq "BollAverages" -and $_.symbol -eq "GBPUSD" -and $_.timeframe -eq 60}
    
  2. No result → slice missing from policy

Solutions:

  • Collect more training data for this slice
  • Re-train model with broader coverage
  • Accept fallback for new slices (data collection mode)

Issue: Exploration caps reached immediately

Symptoms:

GATE: blocked ADXStrategy on EURUSD/H1 reason=explore_cap_day (day=2/2, week=2/3)

Diagnosis:

  1. Check counters:

    Import-Csv "...\explore_counts_day.csv" | Where-Object {$_.key -like "*ADXStrategy|EURUSD|60*"}
    
  2. Verify caps not too restrictive:

    ExploreMaxPerSlicePerDay = 2  // Very conservative
    

Solutions:

  • Increase caps (e.g., 5/10 for PaperEA, 1/2 for LiveEA)
  • Reset counters manually (delete CSV files)
  • Use NoConstraintsMode=true for initial data collection (bypasses all caps)

Issue: Exploration not triggering

Symptoms:

GATE: blocked ADXStrategy on EURUSD/H1 reason=below_winrate (WR=0.42 < 0.50)

(Slice exists but fails threshold, no exploration bypass)

Diagnosis: Exploration only triggers when slice missing entirely, not when slice exists but fails thresholds.

Solutions:

  • This is correct behavior (no-slice-only bypass)
  • To allow trading despite low performance:
    • Lower insights thresholds temporarily
    • Use NoConstraintsMode=true
    • Delete slice from insights.json to force exploration
    • Wait for performance to improve

Issue: NoConstraintsMode not working

Symptoms: Still seeing gate blocks despite NoConstraintsMode=true

Diagnosis:

  1. Verify setting applied:

    if(NoConstraintsMode) Print("NoConstraintsMode is TRUE");
    
  2. Check if using unified system:

    input bool UseUnifiedSystem = true;  // Required for NoConstraintsMode
    

Solutions:

  • Ensure UseUnifiedSystem=true
  • Restart EA after changing NoConstraintsMode
  • Check logs for shadow decisions: [SHADOW] gate=... result=block (but allowing)

See Also: