LTSM-Forex-Bot/AGENTS.md

329 lines
12 KiB
Markdown
Raw Permalink Normal View History

# LSTM Trading Bot - Agent Development Guide
## Project Overview
This document provides guidelines for automated agents (Cursor, Copilot, Claude, etc.) and human contributors working on the LSTM-based multi-timeframe trading bot project.
## 1. Required Checks
### Code Quality
- **Linting**: Run `python -m py_compile` on all Python files
- **Type Checking**: Ensure proper type hints throughout
- **Import Validation**: Verify all imports work correctly
- **Configuration Validation**: Test YAML configuration loading
### Testing
- **Unit Tests**: Run tests for individual modules
- **Integration Tests**: Test data pipeline and model training
- **Backtesting Validation**: Verify strategy performance
- **Import Tests**: Ensure all modules can be imported
### Dependency Management
- **Requirements Check**: Verify `requirements.txt` includes all dependencies
- **Version Compatibility**: Ensure PyTorch, pandas, etc. versions are compatible
- **Optional Dependencies**: Test functionality without optional packages
## 2. Project Structure
```
├── config/ # Configuration files
│ └── config.yaml # Main configuration
├── data/ # Data storage (gitignored)
│ ├── raw/ # Raw OHLCV data
│ └── cache/ # Processed features
├── notebooks/ # Jupyter notebooks
│ ├── 01_data_eda.ipynb # Data exploration
│ ├── 02_train_optuna.ipynb # Model training
│ ├── 03_backtest.ipynb # Backtesting
│ └── 04_live_trading.ipynb # Live trading
├── src/ # Source code
│ ├── data/ # Data loading
│ │ └── loaders.py
│ ├── features/ # Feature engineering
│ │ ├── build_dataset.py
│ │ └── indicators.py
│ ├── models/ # LSTM architectures
│ │ └── lstm_fusion.py
│ ├── training/ # Training framework
│ │ ├── train.py
│ │ └── metrics.py
│ ├── backtest/ # Backtesting engine
│ │ ├── engine.py
│ │ └── strategy.py
│ ├── live/ # Live trading
│ │ ├── broker_base.py
│ │ ├── broker_alpaca.py
│ │ ├── streamer.py
│ │ └── executor.py
│ └── utils/ # Utilities
│ ├── config.py
│ ├── logging.py
│ ├── seeds.py
│ └── times.py
├── tests/ # Test files
├── docs/ # Documentation
│ └── agents/ # Agent-specific docs
└── requirements.txt # Dependencies
```
## 3. Coding Conventions
### Python Standards
- **Language**: Python 3.10+ with type hints
- **Style**: Follow PEP 8 with 4-space indentation
- **Imports**: Standard library first, then third-party, then local
- **Documentation**: Use Google-style docstrings
- **Error Handling**: Comprehensive try-catch with logging
### Project-Specific Conventions
- **Configuration**: All parameters in `config/config.yaml`
- **Logging**: Use structured logging with `src.utils.logging`
- **Error Handling**: Graceful degradation with informative messages
- **Testing**: Unit tests for all public functions
### File Naming
- **Modules**: `snake_case.py`
- **Classes**: `PascalCase`
- **Functions**: `snake_case`
- **Constants**: `UPPER_SNAKE_CASE`
## 4. Development Workflow
### Agent Development Process
1. **Read Memory**: Check `docs/agents/ledger.json` for existing work
2. **Understand Intent**: Clarify requirements before implementation
3. **Plan Changes**: Create minimal, testable implementation plan
4. **Implement**: Write clean, documented code
5. **Test**: Validate functionality and edge cases
6. **Document**: Update relevant documentation
7. **Log**: Add entry to `docs/agents/ledger.json`
### Human Contribution Process
1. **Issue Creation**: Create GitHub issue for proposed changes
2. **Branch Creation**: Create feature branch from `main`
3. **Implementation**: Follow coding conventions
4. **Testing**: Run all relevant tests
5. **PR Creation**: Submit pull request with description
6. **Review**: Address review comments
7. **Merge**: Merge after approval
## 5. Module-Specific Guidelines
### Data Pipeline (`src/data/`, `src/features/`)
- **Data Sources**: Support CSV, Alpaca, Binance, OANDA
- **Feature Engineering**: Technical indicators, lagged features, calendar features
- **Data Quality**: Handle missing data, outliers, and anomalies
- **Performance**: Efficient processing for large datasets
### Model Architecture (`src/models/`)
- **LSTM Variants**: Single LSTM, multi-LSTM fusion
- **Attention Mechanisms**: Multi-head attention implementation
- **Regularization**: Dropout, layer normalization
- **Output Modes**: Regression and classification
### Training Framework (`src/training/`)
- **Optimization**: Optuna hyperparameter tuning
- **Validation**: Walk-forward cross-validation
- **Metrics**: Trading-specific performance measures
- **Checkpointing**: Model saving and loading
### Backtesting (`src/backtest/`)
- **Framework**: Backtrader integration
- **Risk Management**: Position sizing, stop losses
- **Performance**: Comprehensive metrics and reporting
- **Realism**: Slippage, commission, latency modeling
### Live Trading (`src/live/`)
- **Broker Abstraction**: Support multiple exchanges
- **Streaming**: Real-time market data
- **Execution**: Order management and risk controls
- **Monitoring**: Performance tracking and alerting
## 6. Configuration Management
### Main Configuration (`config/config.yaml`)
- **Data Settings**: Symbols, timeframes, sources
- **Model Parameters**: Architecture, hyperparameters
- **Training Settings**: Batch size, epochs, optimization
- **Risk Parameters**: Position sizing, circuit breakers
- **Live Trading**: Broker settings, execution parameters
### Environment Variables
```bash
# Broker API credentials
API_KEY_ALPACA=your_key
API_SECRET_ALPACA=your_secret
API_KEY_BINANCE=your_key
API_KEY_OANDA=your_key
# Optional: Alert webhooks
SLACK_WEBHOOK=your_webhook_url
DISCORD_WEBHOOK=your_webhook_url
```
## 7. Testing Strategy
### Unit Tests
- **Data Loaders**: Test data loading from all sources
- **Feature Engineering**: Validate indicator calculations
- **Model Components**: Test LSTM layers and fusion strategies
- **Utilities**: Test configuration and logging
### Integration Tests
- **End-to-End Pipeline**: Data loading → features → training → prediction
- **Backtesting**: Strategy execution with realistic conditions
- **Live Trading**: Paper trading execution and risk management
### Performance Tests
- **Training Speed**: Model training time on different hardware
- **Memory Usage**: Memory consumption during processing
- **Scalability**: Performance with larger datasets
## 8. Deployment Guidelines
### Google Colab
1. Upload project files to Colab environment
2. Install dependencies: `!pip install -r requirements.txt`
3. Run notebooks in sequence for complete workflow
4. Use GPU runtime for faster training
### Linux Server
1. **System Setup**: Ubuntu/Debian with Python 3.10+
2. **Dependencies**: Install via `pip install -r requirements.txt`
3. **Configuration**: Set environment variables for API keys
4. **Service Setup**: Use systemd for 24/7 operation
5. **Monitoring**: Configure logging and alerting
### Docker Deployment
```dockerfile
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "main.py", "live", "run", "--mode", "paper"]
```
## 9. Risk Management
### Position Sizing
- **Fixed Percentage**: Fixed % of portfolio per trade
- **Kelly Criterion**: Optimal position sizing based on win rate
- **Volatility Adjusted**: Position size based on market volatility
### Circuit Breakers
- **Drawdown Limits**: Stop trading if portfolio drops too much
- **Daily Loss Limits**: Stop trading if daily loss exceeds threshold
- **Position Limits**: Maximum number of concurrent positions
- **Time-based**: Stop trading during high-risk periods
### Monitoring
- **Performance Tracking**: Real-time P&L and risk metrics
- **Alert System**: Notifications for risk events
- **Health Checks**: System status and connectivity monitoring
## 10. Performance Optimization
### Training Optimization
- **GPU Acceleration**: Use CUDA for faster training
- **Batch Processing**: Optimize batch sizes for hardware
- **Mixed Precision**: Use float16 for memory efficiency
- **Model Parallelism**: Distribute model across multiple GPUs
### Inference Optimization
- **Model Quantization**: Reduce model size for faster inference
- **Batch Inference**: Process multiple predictions together
- **Caching**: Cache frequent calculations and features
### Data Optimization
- **Efficient Storage**: Use Parquet for compressed data storage
- **Lazy Loading**: Load data only when needed
- **Memory Mapping**: Use memory-mapped files for large datasets
## 11. Troubleshooting
### Common Issues
- **Import Errors**: Check Python path and dependencies
- **Configuration Errors**: Validate YAML syntax and required fields
- **Data Issues**: Check data format and missing values
- **Model Errors**: Verify tensor shapes and data types
### Debugging Tools
- **Logging**: Comprehensive logging at all levels
- **Profiling**: Performance profiling for bottlenecks
- **Visualization**: Plot data and model outputs for inspection
- **Interactive Debugging**: Use IPython for step-through debugging
## 12. Future Enhancements
### Model Improvements
- **Transformer Architectures**: Add attention-based models
- **Ensemble Methods**: Combine multiple model predictions
- **Reinforcement Learning**: Train using RL for better adaptation
### Feature Enhancements
- **Alternative Data**: News, sentiment, macroeconomic indicators
- **Advanced Indicators**: More sophisticated technical analysis
- **Regime Detection**: Machine learning-based market regime classification
### System Improvements
- **Multi-Asset Trading**: Handle multiple asset classes
- **Portfolio Optimization**: Modern portfolio theory integration
- **Risk Parity**: Equal risk contribution across positions
## 13. Contributing Guidelines
### Code Contributions
1. Follow existing code style and conventions
2. Add tests for new functionality
3. Update documentation for API changes
4. Use meaningful commit messages
### Issue Management
- Use clear, descriptive issue titles
- Provide detailed descriptions and reproduction steps
- Label issues appropriately (bug, enhancement, documentation)
- Reference related issues and PRs
### Review Process
- All PRs require at least one review
- Address review comments promptly
- Ensure CI checks pass before merge
- Update changelog for significant changes
## 14. Maintenance
### Regular Tasks
- **Model Retraining**: Update models with new data
- **Performance Review**: Analyze live vs backtested performance
- **Risk Review**: Adjust risk parameters based on performance
- **Dependency Updates**: Keep packages current and secure
### Monitoring
- **System Health**: Monitor server and application health
- **Performance Metrics**: Track key performance indicators
- **Error Rates**: Monitor and address error patterns
- **Resource Usage**: Track CPU, memory, and disk usage
## 15. Support and Resources
### Documentation
- **README.md**: Project overview and setup instructions
- **API Documentation**: Generated from docstrings
- **Colab Notebooks**: Interactive tutorials and examples
- **Configuration Guide**: Detailed parameter explanations
### Community
- **GitHub Issues**: Bug reports and feature requests
- **Discussions**: General questions and discussions
- **Wiki**: Additional documentation and guides
### Development Tools
- **IDE Support**: Cursor, VS Code with Python extensions
- **Linting**: Pylint, flake8 for code quality
- **Formatting**: Black for consistent code formatting
- **Testing**: Pytest for unit and integration tests
---
This guide ensures consistent development practices and high-quality code across all contributors and automated agents working on the LSTM trading bot project.