Update README.md to provide a comprehensive overview of the LSTM Multi-Timeframe Trading Bot, including features, installation instructions, project structure, data pipeline, model architecture, training, backtesting, live trading, risk management, deployment, API reference, contributing guidelines, and support information.

2025-09-30 23:10:23 -04:00

15 KiB

Raw Permalink Blame History

LSTM Trading Bot - Complete Implementation Guide

🚀 Overview

This guide provides step-by-step instructions to take the scaffolded LSTM trading bot from a project template to a fully operational trading system. Follow these steps in order for the best results.

1. Immediate Setup

1.1 Environment Setup

# 1. Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\\Scripts\\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Verify installation
python -c "import torch, pandas, backtrader; print('All dependencies installed successfully')"

1.2 Configuration Setup

# 1. Copy environment template
cp .env.example .env

# 2. Edit .env with your API credentials
nano .env  # or your preferred editor

# Required for live trading:
# API_KEY_ALPACA=your_key_here
# API_SECRET_ALPACA=your_secret_here

1.3 Project Structure Verification

# Verify all files are present
ls -la
# Should show: config/, src/, notebooks/, docs/, requirements.txt, etc.

# Test configuration loading
python -c "from src.utils.config import get_config; config = get_config(); print('Config loaded successfully')"

2. Data Collection

2.1 Choose Your Data Sources

For Development/Research:

Use CSV files with historical data
Download from Yahoo Finance, Alpha Vantage, or similar

For Live Trading:

Alpaca (US stocks/crypto)
Binance (crypto)
OANDA (forex)

2.2 Collect Historical Data

Option A: CSV Files (Recommended for initial development)

# Create data/raw directory
mkdir -p data/raw

# Download sample data (replace with your data source)
# Example: EURUSD 1h data from 2020-2024
# Place CSV files in data/raw/ with format: SYMBOL_TIMEFRAME.csv
# Columns: timestamp,open,high,low,close,volume

# Verify data format
python -c "
import pandas as pd
df = pd.read_csv('data/raw/EURUSD_1h.csv')
print(f'Data shape: {df.shape}')
print(f'Columns: {list(df.columns)}')
print(f'Date range: {df.timestamp.min()} to {df.timestamp.max()}')
"

Option B: Alpaca API (Live data)

# Set API credentials in .env first
# Then load data via CLI
python main.py data load \
    --symbols EURUSD BTCUSD \
    --timeframes 15m 30m 1h 2h \
    --start 2020-01-01 \
    --end 2024-12-31 \
    --source alpaca

2.3 Data Quality Checks

# Run data quality analysis
import pandas as pd
from src.data.loaders import load_ohlcv_data

# Load and inspect data
data = load_ohlcv_data(
    symbols=['EURUSD'],
    timeframes=['1h'],
    start_date='2020-01-01',
    end_date='2024-12-31',
    source='csv'
)

df = data['EURUSD']['1h']
print(f"Missing values: {df.isnull().sum().sum()}")
print(f"Duplicate timestamps: {df.index.duplicated().sum()}")
print(f"Price gaps: {(df['close'] - df['close'].shift(1)).abs().max()}")

3. Feature Engineering

3.1 Build Training Dataset

# Build features from loaded data
python main.py features build \
    --config config/config.yaml \
    --data-path data/loaded_data.pkl \
    --output data/features.pkl \
    --sequence-length 60

3.2 Feature Analysis

# Analyze built features
import pickle
with open('data/features.pkl', 'rb') as f:
    X, y = pickle.load(f)

print(f"Feature matrix shape: {X.shape}")
print(f"Target shape: {len(y)}")
print(f"Number of features: {X.shape[2]}")

# Check feature distributions
import matplotlib.pyplot as plt
plt.figure(figsize=(15, 10))
for i in range(min(10, X.shape[2])):
    plt.subplot(2, 5, i+1)
    plt.hist(X[:, 0, i], bins=50)
    plt.title(f'Feature {i}')
plt.tight_layout()
plt.show()

4. Model Training

4.1 Hyperparameter Optimization

# Run Optuna optimization (this takes time!)
python main.py training train \
    --config config/config.yaml \
    --data-path data/features.pkl \
    --output models/best_model.pth \
    --optuna 50

Expected Output:

Trial 49 finished with value: -0.2345 and parameters: {...}
Best trial: 42
Best value: -0.1234
Best parameters: {'hidden_size': 128, 'num_layers': 2, 'dropout': 0.2, ...}
Model training completed. Saved to models/best_model.pth

4.2 Manual Training (if needed)

# For custom training without Optuna
from src.training.train import train_model
import pickle

with open('data/features.pkl', 'rb') as f:
    X, y = pickle.load(f)

model = train_model(
    X=X,
    y=y,
    config_path='config/config.yaml',
    save_path='models/manual_model.pth'
)

4.3 Model Validation

# Load and test model
from src.models.lstm_fusion import load_model
import torch

model = load_model('models/best_model.pth')
model.eval()

# Test prediction
with torch.no_grad():
    sample_input = torch.randn(1, 60, model.input_size)
    prediction = model(sample_input)
    print(f"Sample prediction shape: {prediction.shape}")

5. Backtesting

5.1 Run Comprehensive Backtest

# Run backtest with the trained model
python main.py backtest run \
    --config config/config.yaml \
    --data-path data/test_data.pkl \
    --model-path models/best_model.pth \
    --report reports/backtest_results.html

5.2 Analyze Results

Key Metrics to Evaluate:

Sharpe Ratio: > 1.0 is good, > 2.0 is excellent
Max Drawdown: < 15% is acceptable, < 10% is good
Win Rate: > 50% is decent, > 60% is good
Profit Factor: > 1.5 is good, > 2.0 is excellent

# Load and analyze backtest results
import pickle
with open('reports/backtest_results.pkl', 'rb') as f:
    results = pickle.load(f)

print("Backtest Summary:")
print(f"Total Return: {results['portfolio']['total_return']:.2%}")
print(f"Sharpe Ratio: {results['sharpe_ratio']:.3f}")
print(f"Max Drawdown: {results['max_drawdown']:.2%}")
print(f"Total Trades: {results['num_trades']}")
print(f"Win Rate: {results['winning_trades']/results['num_trades']:.2%}")

5.3 Risk-Adjusted Performance

# Calculate risk metrics
from src.training.metrics import TradingMetrics

metrics = TradingMetrics()
portfolio_returns = [trade['pnl'] for trade in results['trades']]

risk_metrics = metrics.calculate_trade_statistics(portfolio_returns)
print(f"Calmar Ratio: {risk_metrics.get('calmar_ratio', 0):.3f}")
print(f"Sortino Ratio: {risk_metrics.get('sortino_ratio', 0):.3f}")

6. Live Trading Setup

6.1 Paper Trading (Recommended First)

# Start paper trading for validation
python main.py live run \
    --config config/config.yaml \
    --model-path models/best_model.pth \
    --mode paper \
    --api-key $API_KEY_ALPACA \
    --api-secret $API_SECRET_ALPACA

Monitor in another terminal:

# Check logs
tail -f logs/trading_bot.log

# Check portfolio value
# The bot will log performance every execution cycle

6.2 Risk Management Configuration

Before going live, adjust risk parameters:

# config/config.yaml - Risk Section
backtest:
  risk:
    max_position_size: 0.02  # 2% per trade
    max_positions: 3         # Conservative for live
    stop_loss: 0.015         # 1.5% stop loss
    take_profit: 0.03        # 3% take profit
    circuit_breaker_drawdown: 0.10  # 10% circuit breaker

6.3 Live Trading (Production)

# Only after successful paper trading!
python main.py live run \
    --config config/config.yaml \
    --model-path models/best_model.pth \
    --mode live \
    --api-key $API_KEY_ALPACA \
    --api-secret $API_SECRET_ALPACA

7. Risk Management Configuration

7.1 Position Sizing Strategies

Conservative (Recommended for beginners):

# Fixed percentage sizing
position_size = portfolio_value * 0.02  # 2% per trade

Advanced (Kelly Criterion):

# Calculate optimal position size
win_rate = 0.55  # From backtesting
avg_win = 0.02   # 2% average win
avg_loss = 0.01  # 1% average loss

kelly_pct = (win_rate * avg_win - (1 - win_rate) * avg_loss) / avg_loss
position_size = portfolio_value * min(kelly_pct, 0.05)  # Cap at 5%

7.2 Circuit Breakers

Daily Loss Limit:

Stop trading if daily loss > 3%
Reset at midnight UTC

Drawdown Protection:

Stop trading if portfolio drops > 15% from peak
Manual reset required

Position Limits:

Maximum 5 concurrent positions
Maximum 2% of portfolio per position

8. Monitoring and Maintenance

8.1 Real-time Monitoring

# Monitor logs
tail -f logs/trading_bot.log | grep -E "(Trade|Signal|Risk|Error)"

# Monitor performance
python -c "
import json
with open('logs/trading_bot.log', 'r') as f:
    for line in f:
        if 'portfolio_value' in line:
            print(line.strip())
"

8.2 Daily Health Checks

# Check system health
import subprocess
import sys

def health_check():
    # Check if model files exist
    assert os.path.exists('models/best_model.pth'), "Model file missing"

    # Check if data is recent
    data_files = glob.glob('data/cache/*.parquet')
    assert len(data_files) > 0, "No cached data found"

    # Check API connectivity
    try:
        # Test API call
        pass
    except:
        print("API connectivity issue")

    print("All health checks passed!")

health_check()

8.3 Model Retraining Schedule

Weekly Retraining:

# Every Sunday at 2 AM
# Add to crontab:
# 0 2 * * 0 /path/to/venv/bin/python /path/to/project/scripts/retrain_model.py

Performance Monitoring:

Track live Sharpe ratio vs backtested
Monitor maximum drawdown
Alert on significant performance degradation

9. Troubleshooting

9.1 Common Issues

Issue: Model not training

# Check data shapes
import pickle
with open('data/features.pkl', 'rb') as f:
    X, y = pickle.load(f)

print(f"X shape: {X.shape}")  # Should be (samples, sequence_length, features)
print(f"y shape: {len(y)}")    # Should match samples

Issue: Poor backtest performance

# Check for overfitting
# Compare train/validation metrics
# Reduce model complexity if needed
# Increase regularization (dropout, L2)

Issue: Live trading connection errors

# Check API credentials
python -c "
import os
print('API Key:', os.getenv('API_KEY_ALPACA')[:10] + '...' if os.getenv('API_KEY_ALPACA') else 'Not set')
print('API Secret:', os.getenv('API_SECRET_ALPACA')[:10] + '...' if os.getenv('API_SECRET_ALPACA') else 'Not set')
"

# Test broker connection
from src.live.broker_alpaca import AlpacaBroker
broker = AlpacaBroker()
print("Connected:", broker.is_connected())

9.2 Debugging Tools

Feature Debugging:

# Check feature distributions
import seaborn as sns
import matplotlib.pyplot as plt

features_df = pd.DataFrame(X[:, 0, :])  # First timestep
plt.figure(figsize=(15, 10))
sns.boxplot(data=features_df)
plt.xticks(rotation=45)
plt.show()

Model Debugging:

# Check model predictions
model.eval()
with torch.no_grad():
    sample = torch.randn(1, 60, model.input_size)
    pred = model(sample)
    print(f"Prediction: {pred.item():.4f}")

10. Performance Optimization

10.1 Training Optimization

# Use GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

# Optimize batch size
# Start with 32, increase until memory limit
batch_size = 64  # Adjust based on your hardware

10.2 Inference Optimization

# Model quantization for faster inference
import torch.quantization

# Quantize model
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

# Save quantized model
torch.save(quantized_model.state_dict(), 'models/quantized_model.pth')

10.3 Memory Optimization

# Use mixed precision training
from torch.cuda.amp import GradScaler, autocast

scaler = GradScaler()

# In training loop:
with autocast():
    outputs = model(inputs)
    loss = criterion(outputs, targets)

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

11. Future Enhancements

11.1 Advanced Features

Ensemble Models:

# Combine multiple models
models = [
    load_model('models/model_v1.pth'),
    load_model('models/model_v2.pth'),
    load_model('models/model_v3.pth')
]

# Ensemble prediction
predictions = [model(input_tensor) for model in models]
ensemble_pred = torch.mean(torch.stack(predictions), dim=0)

Reinforcement Learning:

# Future enhancement: Train with RL
# Use portfolio returns as rewards
# Implement actor-critic or PPO algorithms

11.2 Multi-Asset Trading

# Extend to multiple assets
symbols = ['EURUSD', 'GBPUSD', 'USDJPY', 'BTCUSD', 'ETHUSD']

# Train separate models per asset or shared model
# Implement portfolio optimization

11.3 Advanced Risk Management

Dynamic Position Sizing:

# Adjust position size based on market volatility
volatility = calculate_market_volatility()
position_size = base_size * (1 / (1 + volatility * 10))

Correlation-Based Risk:

# Avoid correlated positions
correlation_matrix = calculate_correlations(positions)
if correlation_matrix.max() > 0.7:  # High correlation
    reduce_position_sizes()

12. Deployment Checklist

Before Going Live:

Backtesting shows consistent profitability (>6 months)
Sharpe ratio > 1.0 across multiple market conditions
Maximum drawdown < 20% in backtests
Paper trading successful for >1 month
Risk management parameters tested and validated
Monitoring and alerting systems in place
Emergency stop procedures documented
API rate limits and costs understood

Production Deployment:

Server setup with 24/7 uptime
Database backups configured
Log aggregation and monitoring
Automated model retraining pipeline
Performance tracking dashboard
Incident response procedures

13. Resources and Support

Documentation:

README.md - Project overview and setup
AGENTS.md - Development guidelines
API Documentation - Detailed API reference

Notebooks:

01_data_eda.ipynb - Data exploration
02_train_optuna.ipynb - Model training
03_backtest.ipynb - Strategy evaluation
04_live_trading.ipynb - Live trading

Monitoring:

Check logs/trading_bot.log for system status
Monitor portfolio value and risk metrics
Set up alerts for critical events

🎯 Success Metrics

Target Performance:

Sharpe Ratio: > 1.5
Max Drawdown: < 15%
Win Rate: > 55%
Profit Factor: > 1.8

Risk Management:

Daily Loss Limit: < 3%
Position Concentration: < 5% per trade
Circuit Breaker Response: < 30 seconds

Operational:

Uptime: > 99.5%
Model Accuracy: > 90% of backtested performance
Response Time: < 1 second for signals

Happy Trading! 🚀📈

This guide will evolve as you gain experience with the system. Start conservatively, monitor closely, and gradually increase complexity and position sizes as you validate performance.

15 KiB Raw Permalink Blame History