Princeec13/mql5

Fork 1

Princeec13 986cf2cfe8 new files added

2025-10-03 01:38:19 -04:00

15 KiB

Raw Permalink Blame History

Observability Guide — DualEA System

Telemetry, monitoring, and debugging reference

Telemetry System
Event Types Reference
SystemMonitor
EventBus
Logging Levels
Debugging Techniques
Performance Monitoring

Telemetry System

Overview

DualEA uses TelemetryStandard wrapper for unified event logging across both PaperEA and LiveEA.

Output Formats:

MT5 Journal (real-time)
CSV files (Common/Files/DualEA/telemetry_YYYYMMDD.csv)
JSON logs (Common/Files/DualEA/logs/trades_YYYYMMDD.json)

Configuration

input bool     TelemetryEnable = true;           // Enable telemetry
input bool     TelemetryVerbose = false;         // Verbose logging
input bool     TelemetryStandard = true;         // Use TelemetryStandard wrapper
input int      TelemetryFlushSeconds = 300;      // Flush every 5 minutes

Event Structure

CSV Format:

timestamp,event_type,symbol,timeframe,strategy,gate,result,reason,latency_ms,metadata
2025-01-15T10:30:00Z,gate_decision,EURUSD,60,ADXStrategy,signal_rinse,pass,,5,confidence=0.85
2025-01-15T10:30:01Z,gate_decision,EURUSD,60,ADXStrategy,insights,block,below_winrate,2,winrate=0.42

JSON Format (UnifiedTradeLogger):

{
  "timestamp": "2025-01-15T10:30:00Z",
  "event_type": "trade_execution",
  "symbol": "EURUSD",
  "timeframe": 60,
  "strategy": "ADXStrategy",
  "direction": 1,
  "entry": 1.0850,
  "sl": 1.0800,
  "tp": 1.0950,
  "lots": 0.10,
  "gates": ["signal_rinse:pass", "market_soap:pass", ...],
  "policy": {"confidence": 0.75, "sl_mult": 1.0},
  "profit": 15.50,
  "r_multiple": 1.2
}

Event Types Reference

Core Events

gate_decision

When: Each gate pass/block decision
Fields: gate, result (pass/block), reason, latency_ms
Example: gate_decision,EURUSD,60,ADXStrategy,market_soap,block,high_correlation,8,corr=0.82

trade_execution

When: Order placed (or paper position created)
Fields: direction, entry, lots, sl, tp, retcode (LiveEA only)
Example: trade_execution,EURUSD,60,ADXStrategy,BUY,1.0850,0.10,1.0800,1.0950,retcode=10009

trade_closed

When: Position closed (SL/TP hit or manual close)
Fields: entry, exit, profit, r_multiple, duration_sec, reason
Example: trade_closed,EURUSD,60,ADXStrategy,1.0850,1.0950,15.50,1.2,3600,tp_hit

policy_load

When: policy.json loaded or reloaded
Fields: slices, min_confidence, file_size, load_time_ms
Example: policy_load,,,,,45,0.55,12345,25

insights_rebuild

When: insights.json regenerated
Fields: slices, source_trades, rebuild_time_sec
Example: insights_rebuild,,,,,123,4567,45

threshold_adjust

When: GateLearningSystem adjusts gate thresholds
Fields: gate, old_threshold, new_threshold, success_rate
Example: threshold_adjust,,,,,signal_rinse,0.60,0.58,0.72

Risk Events

risk4_spread

When: Spread check (Gate 7 or early phase)
Fields: spread_pips, max_allowed, result
Example: risk4_spread,EURUSD,60,,,,block,spread_too_wide,spread=4.2 max=3.0

risk4_margin

When: Margin level check
Fields: margin_level, min_required, result
Example: risk4_margin,EURUSD,60,,,,block,margin_low,level=180% min=200%

risk4_drawdown

When: Drawdown check
Fields: current_dd_pct, max_allowed, result
Example: risk4_drawdown,,,,,12.5,10.0,block

risk4_daily_loss

When: Daily loss circuit breaker
Fields: daily_pnl_pct, max_loss_pct, result
Example: risk4_daily_loss,,,,,−5.2,−5.0,block

risk4_consecutive

When: Consecutive loss check
Fields: consecutive_losses, max_allowed, result
Example: risk4_consecutive,,,,,5,5,block

Position Manager Events

pm_corr_sizing

When: Correlation-adjusted sizing applied
Fields: avg_corr, base_lots, adjusted_lots, dampening
Example: pm_corr_sizing,GBPUSD,60,,0.68,0.10,0.066,0.66

pm_corr_floor

When: Correlation floor applied
Fields: floor_type (mult/lots), floor_value, final_lots
Example: pm_corr_floor,GBPUSD,60,,mult,0.25,0.025

pm_corr_scale_in

When: Scale-in with correlation adjustment
Fields: Same as pm_corr_sizing

pm_adaptive_sizing

When: Adaptive sizing based on performance
Fields: perf_factor, base_lots, adjusted_lots

Selector Events (Phase 5)

p5_refresh

When: Strategy selector refreshed
Fields: strategies_count, refresh_time_ms

p5_rescore

When: Strategies rescored
Fields: strategy, old_score, new_score, reason

p5_auto_tune

When: Selector thresholds auto-tuned
Fields: threshold_type, old_value, new_value

p5_selector_cfg

When: Selector configuration changed
Fields: param, old_value, new_value

Exploration Events

explore_allow

When: Exploration trade allowed
Fields: day_count, day_max, week_count, week_max
Example: explore_allow,EURUSD,60,ADXStrategy,1,2,2,3

explore_block_day

When: Daily exploration cap reached
Fields: Same as explore_allow

explore_block_week

When: Weekly exploration cap reached
Fields: Same as explore_allow

Policy Fallback Events

policy_fallback

When: Fallback mode triggered
Fields: fallback_type (no_policy/slice_miss), demo, allowed
Example: policy_fallback,EURUSD,60,ADXStrategy,no_policy,true,true

SystemMonitor

Purpose

Real-time health monitoring and performance tracking for the entire system.

Initialization

input bool     MonitorEnable = true;
input int      MonitorUpdateSeconds = 60;
input double   MonitorHealthThreshold = 70.0;
input bool     MonitorAlertOnDegradation = true;

CSystemMonitor* monitor = CSystemMonitor::GetInstance();

Metrics Tracked

Gate Success Rates:

double GetGateSuccessRate(int gateNum);  // 0.0-1.0

Tracks pass/fail ratio per gate over rolling window
Target: configurable per gate (default 0.7-0.8)

Processing Times:

double GetAvgGateLatency(int gateNum);   // milliseconds
double GetAvgPipelineLatency();          // milliseconds

System Health Score (0-100):

double GetSystemHealthScore();

Calculated from:

Gate success rates vs targets
Processing latencies vs thresholds
Memory usage
Error rates
Event queue depth

Degradation Detection:

Triggers alert if health score drops below MonitorHealthThreshold
Logs degradation reason (which metric failed)

Usage Example

void OnTimer() {
    if(MonitorEnable) {
        double health = monitor.GetSystemHealthScore();
        
        if(health < MonitorHealthThreshold) {
            Print(StringFormat("[ALERT] System health degraded: %.1f < %.1f",
                health, MonitorHealthThreshold));
            telemetry.Event("system_alert", "health_degraded", health);
        }
        
        // Log metrics
        for(int i = 1; i <= 8; i++) {
            double successRate = monitor.GetGateSuccessRate(i);
            double latency = monitor.GetAvgGateLatency(i);
            Print(StringFormat("[MONITOR] Gate %d: SR=%.2f Latency=%.1fms",
                i, successRate, latency));
        }
    }
}

EventBus

Purpose

Cross-component communication with priority-based event logging.

Configuration

input bool     EventBusEnable = true;
input int      EventBusPriority = 2;         // 0=ALL, 1=LOW, 2=MED, 3=HIGH, 4=CRITICAL
input int      EventBusMaxQueueSize = 1000;

Priority Levels

0 (ALL): Debug/verbose events
1 (LOW): Informational events (gate passes, normal operations)
2 (MEDIUM): Warnings (gate blocks, threshold adjustments)
3 (HIGH): Errors (file I/O failures, policy load failures)
4 (CRITICAL): System failures (circuit breakers, broker errors)

Event Emission

CEventBus* eventBus = CEventBus::GetInstance();

// Emit event
eventBus.Emit(
    EVENT_PRIORITY_MEDIUM,              // Priority
    "gate_decision",                     // Event type
    symbol,                              // Symbol
    timeframe,                           // Timeframe
    "ADXStrategy",                       // Strategy
    "insights",                          // Gate/component
    "block",                             // Result
    "below_winrate",                     // Reason
    telemetry                            // Telemetry sink
);

Event Queue

Events queued if EventBusEnable=true
Filtered by EventBusPriority (only events at or above priority level are logged)
Flushed periodically or on demand
Max queue size: EventBusMaxQueueSize (oldest events dropped if exceeded)

Logging Levels

Log Level Configuration

input int      LogLevel = 2;             // 0=None, 1=Error, 2=Warning, 3=Info, 4=Debug
input bool     LogToFile = true;
input bool     LogToJournal = true;

Level Definitions

0 - None: No logging (not recommended)

1 - Error: Only critical errors

File I/O failures
Broker retcode errors
Parse errors (policy.json, insights.json)

2 - Warning (Default): Errors + warnings

Gate blocks
Policy fallbacks
Circuit breaker trips
Threshold adjustments

3 - Info: Errors + warnings + informational

Trade executions
Gate passes
Policy loads
Insights rebuilds

4 - Debug: All events including verbose details

Every gate decision with full context
Signal generation details
File I/O operations
Learning system updates

Log Output

MT5 Journal:

2025.01.15 10:30:00.123 DualEA (EURUSD,H1) [GATE] signal_rinse: pass - confidence=0.85
2025.01.15 10:30:00.156 DualEA (EURUSD,H1) [GATE] insights: block - below_winrate (WR=0.42 < 0.50)

File Logs (if LogToFile=true):

Path: Common/Files/DualEA/logs/ea_YYYYMMDD.log
Rotation: Daily, keeps last 30 days
Format: Same as Journal with timestamp prefix

Debugging Techniques

Debug Flags

Enable specific debug output:

input bool     DebugTrailing = false;    // Trailing stop operations
input bool     DebugGates = false;       // Gate decisions
input bool     DebugStrategies = false;  // Strategy signals
input bool     DebugPolicy = false;      // Policy loading/application
input bool     DebugInsights = false;    // Insights loading
input bool     DebugKB = false;          // KB writes

Shadow Mode (NoConstraintsMode)

Test gate changes without blocking trades:

input bool     NoConstraintsMode = true;     // Enable shadow mode
input bool     TelemetryVerbose = true;      // Capture all shadow decisions

Shadow Logging:

[SHADOW] gate=insights result=block reason=below_winrate (but allowing due to NoConstraintsMode)

All gate decisions logged via LogGatingShadow() but trades proceed regardless.

Telemetry Analysis

Query gate block reasons:

Import-Csv telemetry_20250115.csv | 
    Where-Object {$_.event_type -eq 'gate_decision' -and $_.result -eq 'block'} |
    Group-Object reason | 
    Select-Object Count, Name | 
    Sort-Object Count -Descending

Analyze latencies:

Import-Csv telemetry_20250115.csv | 
    Where-Object {$_.event_type -eq 'gate_decision'} |
    Measure-Object latency_ms -Average -Maximum

Common Issues

High gate block rate:

Check SystemMonitor.GetGateSuccessRate(gateNum)
Review telemetry for most common reason
Adjust threshold or disable gate temporarily
Use shadow mode to observe impact

Policy fallback always triggering:

Check if policy.json exists: Common/Files/DualEA/policy.json
Verify policyData.slices > 0
Check slice coverage for your symbols/timeframes/strategies
Review DebugPolicy=true logs

Exploration caps reached quickly:

Check counters: explore_counts.csv, explore_counts_day.csv
Increase caps or reset counters (delete files)
Verify NoConstraintsMode=false (otherwise caps bypassed)

KB writes failing:

Enable DebugKB=true
Check Common Files path exists: FolderCreate("DualEA", FILE_COMMON)
Verify file permissions
Check disk space

Performance Monitoring

Latency Targets

Per-Component:

Signal generation: <5ms
Single gate: <10ms
8-stage gates total: <50ms
Policy/insights lookup: <2ms (cached)
KB write: <10ms
Telemetry emit: <1ms
Total pipeline: <100ms

Monitoring:

double pipelineStart = GetTickCount();
// ... pipeline execution ...
double pipelineEnd = GetTickCount();
double latency = pipelineEnd - pipelineStart;

if(latency > 100) {
    Print(StringFormat("[PERF] Pipeline slow: %.1fms", latency));
    telemetry.Event("performance_alert", "pipeline_slow", latency);
}

Memory Usage

Tracking:

long memUsed = TerminalInfoInteger(TERMINAL_MEMORY_USED);  // KB
int handles = IndicatorsTotal();

if(memUsed > 50000) {  // 50MB
    Print(StringFormat("[PERF] High memory: %d KB", memUsed));
}

Expected Footprint:

Single EA instance: ~1.5MB
With all strategies loaded: ~2MB
High telemetry volume: ~5MB

File I/O Monitoring

Batch Writes:

KB writes: Buffer every 10 trades
Features export: Buffer every 50 trades
Telemetry flush: Every 5 minutes or 1000 events

Monitoring:

int writeCount = 0;
double totalWriteTime = 0;

void WriteToKB(...) {
    double start = GetTickCount();
    // ... write operation ...
    double end = GetTickCount();
    
    writeCount++;
    totalWriteTime += (end - start);
    
    if(writeCount % 100 == 0) {
        double avgTime = totalWriteTime / writeCount;
        Print(StringFormat("[PERF] KB writes: %d total, %.2fms avg", 
            writeCount, avgTime));
    }
}

Telemetry Best Practices

Use appropriate log levels: Debug only during development, Info/Warning for production
Enable TelemetryStandard: Provides consistent formatting across EAs
Flush regularly: Don't let event queue grow unbounded
Rotate logs: Keep file sizes manageable (<100MB)
Monitor health score: Set up alerts for degradation
Use shadow mode: Test gate changes safely
Archive old telemetry: Compress and move to cold storage after 30 days

See Also:

Configuration-Reference.md - Telemetry input parameters
Execution-Pipeline.md - When events are emitted
Policy-Exploration-Guide.md - Policy/exploration events

15 KiB Raw Permalink Blame History

Observability Guide — DualEA System

Table of Contents

Telemetry System

Overview

Configuration

Event Structure

Event Types Reference

Core Events

Risk Events

Position Manager Events

Selector Events (Phase 5)

Exploration Events

Policy Fallback Events

SystemMonitor

Purpose

Initialization

Metrics Tracked

Usage Example

EventBus

Purpose

Configuration

Priority Levels

Event Emission

Event Queue

Logging Levels

Log Level Configuration

Level Definitions

Log Output

Debugging Techniques

Debug Flags

Shadow Mode (NoConstraintsMode)

Telemetry Analysis

Common Issues

Performance Monitoring

Latency Targets

Memory Usage

File I/O Monitoring

Telemetry Best Practices

15 KiB

Raw Permalink Blame History