# Observability Guide — DualEA System **Telemetry, monitoring, and debugging reference** --- ## Table of Contents 1. [Telemetry System](#telemetry-system) 2. [Event Types Reference](#event-types-reference) 3. [SystemMonitor](#systemmonitor) 4. [EventBus](#eventbus) 5. [Logging Levels](#logging-levels) 6. [Debugging Techniques](#debugging-techniques) 7. [Performance Monitoring](#performance-monitoring) --- ## Telemetry System ### Overview DualEA uses **TelemetryStandard** wrapper for unified event logging across both PaperEA and LiveEA. **Output Formats**: - MT5 Journal (real-time) - CSV files (Common/Files/DualEA/telemetry_YYYYMMDD.csv) - JSON logs (Common/Files/DualEA/logs/trades_YYYYMMDD.json) ### Configuration ```cpp input bool TelemetryEnable = true; // Enable telemetry input bool TelemetryVerbose = false; // Verbose logging input bool TelemetryStandard = true; // Use TelemetryStandard wrapper input int TelemetryFlushSeconds = 300; // Flush every 5 minutes ``` ### Event Structure **CSV Format**: ```csv timestamp,event_type,symbol,timeframe,strategy,gate,result,reason,latency_ms,metadata 2025-01-15T10:30:00Z,gate_decision,EURUSD,60,ADXStrategy,signal_rinse,pass,,5,confidence=0.85 2025-01-15T10:30:01Z,gate_decision,EURUSD,60,ADXStrategy,insights,block,below_winrate,2,winrate=0.42 ``` **JSON Format** (UnifiedTradeLogger): ```json { "timestamp": "2025-01-15T10:30:00Z", "event_type": "trade_execution", "symbol": "EURUSD", "timeframe": 60, "strategy": "ADXStrategy", "direction": 1, "entry": 1.0850, "sl": 1.0800, "tp": 1.0950, "lots": 0.10, "gates": ["signal_rinse:pass", "market_soap:pass", ...], "policy": {"confidence": 0.75, "sl_mult": 1.0}, "profit": 15.50, "r_multiple": 1.2 } ``` --- ## Event Types Reference ### Core Events **gate_decision** - **When**: Each gate pass/block decision - **Fields**: `gate`, `result` (pass/block), `reason`, `latency_ms` - **Example**: `gate_decision,EURUSD,60,ADXStrategy,market_soap,block,high_correlation,8,corr=0.82` **trade_execution** - **When**: Order placed (or paper position created) - **Fields**: `direction`, `entry`, `lots`, `sl`, `tp`, `retcode` (LiveEA only) - **Example**: `trade_execution,EURUSD,60,ADXStrategy,BUY,1.0850,0.10,1.0800,1.0950,retcode=10009` **trade_closed** - **When**: Position closed (SL/TP hit or manual close) - **Fields**: `entry`, `exit`, `profit`, `r_multiple`, `duration_sec`, `reason` - **Example**: `trade_closed,EURUSD,60,ADXStrategy,1.0850,1.0950,15.50,1.2,3600,tp_hit` **policy_load** - **When**: policy.json loaded or reloaded - **Fields**: `slices`, `min_confidence`, `file_size`, `load_time_ms` - **Example**: `policy_load,,,,,45,0.55,12345,25` **insights_rebuild** - **When**: insights.json regenerated - **Fields**: `slices`, `source_trades`, `rebuild_time_sec` - **Example**: `insights_rebuild,,,,,123,4567,45` **threshold_adjust** - **When**: GateLearningSystem adjusts gate thresholds - **Fields**: `gate`, `old_threshold`, `new_threshold`, `success_rate` - **Example**: `threshold_adjust,,,,,signal_rinse,0.60,0.58,0.72` ### Risk Events **risk4_spread** - **When**: Spread check (Gate 7 or early phase) - **Fields**: `spread_pips`, `max_allowed`, `result` - **Example**: `risk4_spread,EURUSD,60,,,,block,spread_too_wide,spread=4.2 max=3.0` **risk4_margin** - **When**: Margin level check - **Fields**: `margin_level`, `min_required`, `result` - **Example**: `risk4_margin,EURUSD,60,,,,block,margin_low,level=180% min=200%` **risk4_drawdown** - **When**: Drawdown check - **Fields**: `current_dd_pct`, `max_allowed`, `result` - **Example**: `risk4_drawdown,,,,,12.5,10.0,block` **risk4_daily_loss** - **When**: Daily loss circuit breaker - **Fields**: `daily_pnl_pct`, `max_loss_pct`, `result` - **Example**: `risk4_daily_loss,,,,,−5.2,−5.0,block` **risk4_consecutive** - **When**: Consecutive loss check - **Fields**: `consecutive_losses`, `max_allowed`, `result` - **Example**: `risk4_consecutive,,,,,5,5,block` ### Position Manager Events **pm_corr_sizing** - **When**: Correlation-adjusted sizing applied - **Fields**: `avg_corr`, `base_lots`, `adjusted_lots`, `dampening` - **Example**: `pm_corr_sizing,GBPUSD,60,,0.68,0.10,0.066,0.66` **pm_corr_floor** - **When**: Correlation floor applied - **Fields**: `floor_type` (mult/lots), `floor_value`, `final_lots` - **Example**: `pm_corr_floor,GBPUSD,60,,mult,0.25,0.025` **pm_corr_scale_in** - **When**: Scale-in with correlation adjustment - **Fields**: Same as `pm_corr_sizing` **pm_adaptive_sizing** - **When**: Adaptive sizing based on performance - **Fields**: `perf_factor`, `base_lots`, `adjusted_lots` ### Selector Events (Phase 5) **p5_refresh** - **When**: Strategy selector refreshed - **Fields**: `strategies_count`, `refresh_time_ms` **p5_rescore** - **When**: Strategies rescored - **Fields**: `strategy`, `old_score`, `new_score`, `reason` **p5_auto_tune** - **When**: Selector thresholds auto-tuned - **Fields**: `threshold_type`, `old_value`, `new_value` **p5_selector_cfg** - **When**: Selector configuration changed - **Fields**: `param`, `old_value`, `new_value` ### Exploration Events **explore_allow** - **When**: Exploration trade allowed - **Fields**: `day_count`, `day_max`, `week_count`, `week_max` - **Example**: `explore_allow,EURUSD,60,ADXStrategy,1,2,2,3` **explore_block_day** - **When**: Daily exploration cap reached - **Fields**: Same as `explore_allow` **explore_block_week** - **When**: Weekly exploration cap reached - **Fields**: Same as `explore_allow` ### Policy Fallback Events **policy_fallback** - **When**: Fallback mode triggered - **Fields**: `fallback_type` (no_policy/slice_miss), `demo`, `allowed` - **Example**: `policy_fallback,EURUSD,60,ADXStrategy,no_policy,true,true` --- ## SystemMonitor ### Purpose Real-time health monitoring and performance tracking for the entire system. ### Initialization ```cpp input bool MonitorEnable = true; input int MonitorUpdateSeconds = 60; input double MonitorHealthThreshold = 70.0; input bool MonitorAlertOnDegradation = true; CSystemMonitor* monitor = CSystemMonitor::GetInstance(); ``` ### Metrics Tracked **Gate Success Rates**: ```cpp double GetGateSuccessRate(int gateNum); // 0.0-1.0 ``` - Tracks pass/fail ratio per gate over rolling window - Target: configurable per gate (default 0.7-0.8) **Processing Times**: ```cpp double GetAvgGateLatency(int gateNum); // milliseconds double GetAvgPipelineLatency(); // milliseconds ``` **System Health Score** (0-100): ```cpp double GetSystemHealthScore(); ``` Calculated from: - Gate success rates vs targets - Processing latencies vs thresholds - Memory usage - Error rates - Event queue depth **Degradation Detection**: - Triggers alert if health score drops below `MonitorHealthThreshold` - Logs degradation reason (which metric failed) ### Usage Example ```cpp void OnTimer() { if(MonitorEnable) { double health = monitor.GetSystemHealthScore(); if(health < MonitorHealthThreshold) { Print(StringFormat("[ALERT] System health degraded: %.1f < %.1f", health, MonitorHealthThreshold)); telemetry.Event("system_alert", "health_degraded", health); } // Log metrics for(int i = 1; i <= 8; i++) { double successRate = monitor.GetGateSuccessRate(i); double latency = monitor.GetAvgGateLatency(i); Print(StringFormat("[MONITOR] Gate %d: SR=%.2f Latency=%.1fms", i, successRate, latency)); } } } ``` --- ## EventBus ### Purpose Cross-component communication with priority-based event logging. ### Configuration ```cpp input bool EventBusEnable = true; input int EventBusPriority = 2; // 0=ALL, 1=LOW, 2=MED, 3=HIGH, 4=CRITICAL input int EventBusMaxQueueSize = 1000; ``` ### Priority Levels - **0 (ALL)**: Debug/verbose events - **1 (LOW)**: Informational events (gate passes, normal operations) - **2 (MEDIUM)**: Warnings (gate blocks, threshold adjustments) - **3 (HIGH)**: Errors (file I/O failures, policy load failures) - **4 (CRITICAL)**: System failures (circuit breakers, broker errors) ### Event Emission ```cpp CEventBus* eventBus = CEventBus::GetInstance(); // Emit event eventBus.Emit( EVENT_PRIORITY_MEDIUM, // Priority "gate_decision", // Event type symbol, // Symbol timeframe, // Timeframe "ADXStrategy", // Strategy "insights", // Gate/component "block", // Result "below_winrate", // Reason telemetry // Telemetry sink ); ``` ### Event Queue - Events queued if `EventBusEnable=true` - Filtered by `EventBusPriority` (only events at or above priority level are logged) - Flushed periodically or on demand - Max queue size: `EventBusMaxQueueSize` (oldest events dropped if exceeded) --- ## Logging Levels ### Log Level Configuration ```cpp input int LogLevel = 2; // 0=None, 1=Error, 2=Warning, 3=Info, 4=Debug input bool LogToFile = true; input bool LogToJournal = true; ``` ### Level Definitions **0 - None**: No logging (not recommended) **1 - Error**: Only critical errors - File I/O failures - Broker retcode errors - Parse errors (policy.json, insights.json) **2 - Warning** (Default): Errors + warnings - Gate blocks - Policy fallbacks - Circuit breaker trips - Threshold adjustments **3 - Info**: Errors + warnings + informational - Trade executions - Gate passes - Policy loads - Insights rebuilds **4 - Debug**: All events including verbose details - Every gate decision with full context - Signal generation details - File I/O operations - Learning system updates ### Log Output **MT5 Journal**: ``` 2025.01.15 10:30:00.123 DualEA (EURUSD,H1) [GATE] signal_rinse: pass - confidence=0.85 2025.01.15 10:30:00.156 DualEA (EURUSD,H1) [GATE] insights: block - below_winrate (WR=0.42 < 0.50) ``` **File Logs** (if `LogToFile=true`): - Path: `Common/Files/DualEA/logs/ea_YYYYMMDD.log` - Rotation: Daily, keeps last 30 days - Format: Same as Journal with timestamp prefix --- ## Debugging Techniques ### Debug Flags Enable specific debug output: ```cpp input bool DebugTrailing = false; // Trailing stop operations input bool DebugGates = false; // Gate decisions input bool DebugStrategies = false; // Strategy signals input bool DebugPolicy = false; // Policy loading/application input bool DebugInsights = false; // Insights loading input bool DebugKB = false; // KB writes ``` ### Shadow Mode (NoConstraintsMode) Test gate changes without blocking trades: ```cpp input bool NoConstraintsMode = true; // Enable shadow mode input bool TelemetryVerbose = true; // Capture all shadow decisions ``` **Shadow Logging**: ``` [SHADOW] gate=insights result=block reason=below_winrate (but allowing due to NoConstraintsMode) ``` All gate decisions logged via `LogGatingShadow()` but trades proceed regardless. ### Telemetry Analysis **Query gate block reasons**: ```powershell Import-Csv telemetry_20250115.csv | Where-Object {$_.event_type -eq 'gate_decision' -and $_.result -eq 'block'} | Group-Object reason | Select-Object Count, Name | Sort-Object Count -Descending ``` **Analyze latencies**: ```powershell Import-Csv telemetry_20250115.csv | Where-Object {$_.event_type -eq 'gate_decision'} | Measure-Object latency_ms -Average -Maximum ``` ### Common Issues **High gate block rate**: 1. Check `SystemMonitor.GetGateSuccessRate(gateNum)` 2. Review telemetry for most common `reason` 3. Adjust threshold or disable gate temporarily 4. Use shadow mode to observe impact **Policy fallback always triggering**: 1. Check if policy.json exists: `Common/Files/DualEA/policy.json` 2. Verify `policyData.slices > 0` 3. Check slice coverage for your symbols/timeframes/strategies 4. Review `DebugPolicy=true` logs **Exploration caps reached quickly**: 1. Check counters: `explore_counts.csv`, `explore_counts_day.csv` 2. Increase caps or reset counters (delete files) 3. Verify `NoConstraintsMode=false` (otherwise caps bypassed) **KB writes failing**: 1. Enable `DebugKB=true` 2. Check Common Files path exists: `FolderCreate("DualEA", FILE_COMMON)` 3. Verify file permissions 4. Check disk space --- ## Performance Monitoring ### Latency Targets **Per-Component**: - Signal generation: <5ms - Single gate: <10ms - 8-stage gates total: <50ms - Policy/insights lookup: <2ms (cached) - KB write: <10ms - Telemetry emit: <1ms - **Total pipeline**: <100ms **Monitoring**: ```cpp double pipelineStart = GetTickCount(); // ... pipeline execution ... double pipelineEnd = GetTickCount(); double latency = pipelineEnd - pipelineStart; if(latency > 100) { Print(StringFormat("[PERF] Pipeline slow: %.1fms", latency)); telemetry.Event("performance_alert", "pipeline_slow", latency); } ``` ### Memory Usage **Tracking**: ```cpp long memUsed = TerminalInfoInteger(TERMINAL_MEMORY_USED); // KB int handles = IndicatorsTotal(); if(memUsed > 50000) { // 50MB Print(StringFormat("[PERF] High memory: %d KB", memUsed)); } ``` **Expected Footprint**: - Single EA instance: ~1.5MB - With all strategies loaded: ~2MB - High telemetry volume: ~5MB ### File I/O Monitoring **Batch Writes**: - KB writes: Buffer every 10 trades - Features export: Buffer every 50 trades - Telemetry flush: Every 5 minutes or 1000 events **Monitoring**: ```cpp int writeCount = 0; double totalWriteTime = 0; void WriteToKB(...) { double start = GetTickCount(); // ... write operation ... double end = GetTickCount(); writeCount++; totalWriteTime += (end - start); if(writeCount % 100 == 0) { double avgTime = totalWriteTime / writeCount; Print(StringFormat("[PERF] KB writes: %d total, %.2fms avg", writeCount, avgTime)); } } ``` --- ## Telemetry Best Practices 1. **Use appropriate log levels**: Debug only during development, Info/Warning for production 2. **Enable TelemetryStandard**: Provides consistent formatting across EAs 3. **Flush regularly**: Don't let event queue grow unbounded 4. **Rotate logs**: Keep file sizes manageable (<100MB) 5. **Monitor health score**: Set up alerts for degradation 6. **Use shadow mode**: Test gate changes safely 7. **Archive old telemetry**: Compress and move to cold storage after 30 days --- **See Also:** - [Configuration-Reference.md](Configuration-Reference.md) - Telemetry input parameters - [Execution-Pipeline.md](Execution-Pipeline.md) - When events are emitted - [Policy-Exploration-Guide.md](Policy-Exploration-Guide.md) - Policy/exploration events