15 KiB
LSTM Trading Bot - Complete Implementation Guide
🚀 Overview
This guide provides step-by-step instructions to take the scaffolded LSTM trading bot from a project template to a fully operational trading system. Follow these steps in order for the best results.
📋 Table of Contents
- Immediate Setup
- Data Collection
- Feature Engineering
- Model Training
- Backtesting
- Live Trading Setup
- Risk Management Configuration
- Monitoring and Maintenance
- Troubleshooting
- Performance Optimization
- Future Enhancements
1. Immediate Setup
1.1 Environment Setup
# 1. Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\\Scripts\\activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. Verify installation
python -c "import torch, pandas, backtrader; print('All dependencies installed successfully')"
1.2 Configuration Setup
# 1. Copy environment template
cp .env.example .env
# 2. Edit .env with your API credentials
nano .env # or your preferred editor
# Required for live trading:
# API_KEY_ALPACA=your_key_here
# API_SECRET_ALPACA=your_secret_here
1.3 Project Structure Verification
# Verify all files are present
ls -la
# Should show: config/, src/, notebooks/, docs/, requirements.txt, etc.
# Test configuration loading
python -c "from src.utils.config import get_config; config = get_config(); print('Config loaded successfully')"
2. Data Collection
2.1 Choose Your Data Sources
For Development/Research:
- Use CSV files with historical data
- Download from Yahoo Finance, Alpha Vantage, or similar
For Live Trading:
- Alpaca (US stocks/crypto)
- Binance (crypto)
- OANDA (forex)
2.2 Collect Historical Data
Option A: CSV Files (Recommended for initial development)
# Create data/raw directory
mkdir -p data/raw
# Download sample data (replace with your data source)
# Example: EURUSD 1h data from 2020-2024
# Place CSV files in data/raw/ with format: SYMBOL_TIMEFRAME.csv
# Columns: timestamp,open,high,low,close,volume
# Verify data format
python -c "
import pandas as pd
df = pd.read_csv('data/raw/EURUSD_1h.csv')
print(f'Data shape: {df.shape}')
print(f'Columns: {list(df.columns)}')
print(f'Date range: {df.timestamp.min()} to {df.timestamp.max()}')
"
Option B: Alpaca API (Live data)
# Set API credentials in .env first
# Then load data via CLI
python main.py data load \
--symbols EURUSD BTCUSD \
--timeframes 15m 30m 1h 2h \
--start 2020-01-01 \
--end 2024-12-31 \
--source alpaca
2.3 Data Quality Checks
# Run data quality analysis
import pandas as pd
from src.data.loaders import load_ohlcv_data
# Load and inspect data
data = load_ohlcv_data(
symbols=['EURUSD'],
timeframes=['1h'],
start_date='2020-01-01',
end_date='2024-12-31',
source='csv'
)
df = data['EURUSD']['1h']
print(f"Missing values: {df.isnull().sum().sum()}")
print(f"Duplicate timestamps: {df.index.duplicated().sum()}")
print(f"Price gaps: {(df['close'] - df['close'].shift(1)).abs().max()}")
3. Feature Engineering
3.1 Build Training Dataset
# Build features from loaded data
python main.py features build \
--config config/config.yaml \
--data-path data/loaded_data.pkl \
--output data/features.pkl \
--sequence-length 60
3.2 Feature Analysis
# Analyze built features
import pickle
with open('data/features.pkl', 'rb') as f:
X, y = pickle.load(f)
print(f"Feature matrix shape: {X.shape}")
print(f"Target shape: {len(y)}")
print(f"Number of features: {X.shape[2]}")
# Check feature distributions
import matplotlib.pyplot as plt
plt.figure(figsize=(15, 10))
for i in range(min(10, X.shape[2])):
plt.subplot(2, 5, i+1)
plt.hist(X[:, 0, i], bins=50)
plt.title(f'Feature {i}')
plt.tight_layout()
plt.show()
4. Model Training
4.1 Hyperparameter Optimization
# Run Optuna optimization (this takes time!)
python main.py training train \
--config config/config.yaml \
--data-path data/features.pkl \
--output models/best_model.pth \
--optuna 50
Expected Output:
Trial 49 finished with value: -0.2345 and parameters: {...}
Best trial: 42
Best value: -0.1234
Best parameters: {'hidden_size': 128, 'num_layers': 2, 'dropout': 0.2, ...}
Model training completed. Saved to models/best_model.pth
4.2 Manual Training (if needed)
# For custom training without Optuna
from src.training.train import train_model
import pickle
with open('data/features.pkl', 'rb') as f:
X, y = pickle.load(f)
model = train_model(
X=X,
y=y,
config_path='config/config.yaml',
save_path='models/manual_model.pth'
)
4.3 Model Validation
# Load and test model
from src.models.lstm_fusion import load_model
import torch
model = load_model('models/best_model.pth')
model.eval()
# Test prediction
with torch.no_grad():
sample_input = torch.randn(1, 60, model.input_size)
prediction = model(sample_input)
print(f"Sample prediction shape: {prediction.shape}")
5. Backtesting
5.1 Run Comprehensive Backtest
# Run backtest with the trained model
python main.py backtest run \
--config config/config.yaml \
--data-path data/test_data.pkl \
--model-path models/best_model.pth \
--report reports/backtest_results.html
5.2 Analyze Results
Key Metrics to Evaluate:
- Sharpe Ratio: > 1.0 is good, > 2.0 is excellent
- Max Drawdown: < 15% is acceptable, < 10% is good
- Win Rate: > 50% is decent, > 60% is good
- Profit Factor: > 1.5 is good, > 2.0 is excellent
# Load and analyze backtest results
import pickle
with open('reports/backtest_results.pkl', 'rb') as f:
results = pickle.load(f)
print("Backtest Summary:")
print(f"Total Return: {results['portfolio']['total_return']:.2%}")
print(f"Sharpe Ratio: {results['sharpe_ratio']:.3f}")
print(f"Max Drawdown: {results['max_drawdown']:.2%}")
print(f"Total Trades: {results['num_trades']}")
print(f"Win Rate: {results['winning_trades']/results['num_trades']:.2%}")
5.3 Risk-Adjusted Performance
# Calculate risk metrics
from src.training.metrics import TradingMetrics
metrics = TradingMetrics()
portfolio_returns = [trade['pnl'] for trade in results['trades']]
risk_metrics = metrics.calculate_trade_statistics(portfolio_returns)
print(f"Calmar Ratio: {risk_metrics.get('calmar_ratio', 0):.3f}")
print(f"Sortino Ratio: {risk_metrics.get('sortino_ratio', 0):.3f}")
6. Live Trading Setup
6.1 Paper Trading (Recommended First)
# Start paper trading for validation
python main.py live run \
--config config/config.yaml \
--model-path models/best_model.pth \
--mode paper \
--api-key $API_KEY_ALPACA \
--api-secret $API_SECRET_ALPACA
Monitor in another terminal:
# Check logs
tail -f logs/trading_bot.log
# Check portfolio value
# The bot will log performance every execution cycle
6.2 Risk Management Configuration
Before going live, adjust risk parameters:
# config/config.yaml - Risk Section
backtest:
risk:
max_position_size: 0.02 # 2% per trade
max_positions: 3 # Conservative for live
stop_loss: 0.015 # 1.5% stop loss
take_profit: 0.03 # 3% take profit
circuit_breaker_drawdown: 0.10 # 10% circuit breaker
6.3 Live Trading (Production)
# Only after successful paper trading!
python main.py live run \
--config config/config.yaml \
--model-path models/best_model.pth \
--mode live \
--api-key $API_KEY_ALPACA \
--api-secret $API_SECRET_ALPACA
7. Risk Management Configuration
7.1 Position Sizing Strategies
Conservative (Recommended for beginners):
# Fixed percentage sizing
position_size = portfolio_value * 0.02 # 2% per trade
Advanced (Kelly Criterion):
# Calculate optimal position size
win_rate = 0.55 # From backtesting
avg_win = 0.02 # 2% average win
avg_loss = 0.01 # 1% average loss
kelly_pct = (win_rate * avg_win - (1 - win_rate) * avg_loss) / avg_loss
position_size = portfolio_value * min(kelly_pct, 0.05) # Cap at 5%
7.2 Circuit Breakers
Daily Loss Limit:
- Stop trading if daily loss > 3%
- Reset at midnight UTC
Drawdown Protection:
- Stop trading if portfolio drops > 15% from peak
- Manual reset required
Position Limits:
- Maximum 5 concurrent positions
- Maximum 2% of portfolio per position
8. Monitoring and Maintenance
8.1 Real-time Monitoring
# Monitor logs
tail -f logs/trading_bot.log | grep -E "(Trade|Signal|Risk|Error)"
# Monitor performance
python -c "
import json
with open('logs/trading_bot.log', 'r') as f:
for line in f:
if 'portfolio_value' in line:
print(line.strip())
"
8.2 Daily Health Checks
# Check system health
import subprocess
import sys
def health_check():
# Check if model files exist
assert os.path.exists('models/best_model.pth'), "Model file missing"
# Check if data is recent
data_files = glob.glob('data/cache/*.parquet')
assert len(data_files) > 0, "No cached data found"
# Check API connectivity
try:
# Test API call
pass
except:
print("API connectivity issue")
print("All health checks passed!")
health_check()
8.3 Model Retraining Schedule
Weekly Retraining:
# Every Sunday at 2 AM
# Add to crontab:
# 0 2 * * 0 /path/to/venv/bin/python /path/to/project/scripts/retrain_model.py
Performance Monitoring:
- Track live Sharpe ratio vs backtested
- Monitor maximum drawdown
- Alert on significant performance degradation
9. Troubleshooting
9.1 Common Issues
Issue: Model not training
# Check data shapes
import pickle
with open('data/features.pkl', 'rb') as f:
X, y = pickle.load(f)
print(f"X shape: {X.shape}") # Should be (samples, sequence_length, features)
print(f"y shape: {len(y)}") # Should match samples
Issue: Poor backtest performance
# Check for overfitting
# Compare train/validation metrics
# Reduce model complexity if needed
# Increase regularization (dropout, L2)
Issue: Live trading connection errors
# Check API credentials
python -c "
import os
print('API Key:', os.getenv('API_KEY_ALPACA')[:10] + '...' if os.getenv('API_KEY_ALPACA') else 'Not set')
print('API Secret:', os.getenv('API_SECRET_ALPACA')[:10] + '...' if os.getenv('API_SECRET_ALPACA') else 'Not set')
"
# Test broker connection
from src.live.broker_alpaca import AlpacaBroker
broker = AlpacaBroker()
print("Connected:", broker.is_connected())
9.2 Debugging Tools
Feature Debugging:
# Check feature distributions
import seaborn as sns
import matplotlib.pyplot as plt
features_df = pd.DataFrame(X[:, 0, :]) # First timestep
plt.figure(figsize=(15, 10))
sns.boxplot(data=features_df)
plt.xticks(rotation=45)
plt.show()
Model Debugging:
# Check model predictions
model.eval()
with torch.no_grad():
sample = torch.randn(1, 60, model.input_size)
pred = model(sample)
print(f"Prediction: {pred.item():.4f}")
10. Performance Optimization
10.1 Training Optimization
# Use GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
# Optimize batch size
# Start with 32, increase until memory limit
batch_size = 64 # Adjust based on your hardware
10.2 Inference Optimization
# Model quantization for faster inference
import torch.quantization
# Quantize model
quantized_model = torch.quantization.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.qint8
)
# Save quantized model
torch.save(quantized_model.state_dict(), 'models/quantized_model.pth')
10.3 Memory Optimization
# Use mixed precision training
from torch.cuda.amp import GradScaler, autocast
scaler = GradScaler()
# In training loop:
with autocast():
outputs = model(inputs)
loss = criterion(outputs, targets)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
11. Future Enhancements
11.1 Advanced Features
Ensemble Models:
# Combine multiple models
models = [
load_model('models/model_v1.pth'),
load_model('models/model_v2.pth'),
load_model('models/model_v3.pth')
]
# Ensemble prediction
predictions = [model(input_tensor) for model in models]
ensemble_pred = torch.mean(torch.stack(predictions), dim=0)
Reinforcement Learning:
# Future enhancement: Train with RL
# Use portfolio returns as rewards
# Implement actor-critic or PPO algorithms
11.2 Multi-Asset Trading
# Extend to multiple assets
symbols = ['EURUSD', 'GBPUSD', 'USDJPY', 'BTCUSD', 'ETHUSD']
# Train separate models per asset or shared model
# Implement portfolio optimization
11.3 Advanced Risk Management
Dynamic Position Sizing:
# Adjust position size based on market volatility
volatility = calculate_market_volatility()
position_size = base_size * (1 / (1 + volatility * 10))
Correlation-Based Risk:
# Avoid correlated positions
correlation_matrix = calculate_correlations(positions)
if correlation_matrix.max() > 0.7: # High correlation
reduce_position_sizes()
12. Deployment Checklist
Before Going Live:
- Backtesting shows consistent profitability (>6 months)
- Sharpe ratio > 1.0 across multiple market conditions
- Maximum drawdown < 20% in backtests
- Paper trading successful for >1 month
- Risk management parameters tested and validated
- Monitoring and alerting systems in place
- Emergency stop procedures documented
- API rate limits and costs understood
Production Deployment:
- Server setup with 24/7 uptime
- Database backups configured
- Log aggregation and monitoring
- Automated model retraining pipeline
- Performance tracking dashboard
- Incident response procedures
13. Resources and Support
Documentation:
- README.md - Project overview and setup
- AGENTS.md - Development guidelines
- API Documentation - Detailed API reference
Notebooks:
- 01_data_eda.ipynb - Data exploration
- 02_train_optuna.ipynb - Model training
- 03_backtest.ipynb - Strategy evaluation
- 04_live_trading.ipynb - Live trading
Monitoring:
- Check
logs/trading_bot.logfor system status - Monitor portfolio value and risk metrics
- Set up alerts for critical events
🎯 Success Metrics
Target Performance:
- Sharpe Ratio: > 1.5
- Max Drawdown: < 15%
- Win Rate: > 55%
- Profit Factor: > 1.8
Risk Management:
- Daily Loss Limit: < 3%
- Position Concentration: < 5% per trade
- Circuit Breaker Response: < 30 seconds
Operational:
- Uptime: > 99.5%
- Model Accuracy: > 90% of backtested performance
- Response Time: < 1 second for signals
Happy Trading! 🚀📈
This guide will evolve as you gain experience with the system. Start conservatively, monitor closely, and gradually increase complexity and position sizes as you validate performance.