# LSTM Trading Bot - Complete Implementation Guide

## 🚀 Overview

This guide provides step-by-step instructions to take the scaffolded LSTM trading bot from a project template to a fully operational trading system. Follow these steps in order for the best results.

## 📋 Table of Contents

1. [Immediate Setup](#immediate-setup)
2. [Data Collection](#data-collection)
3. [Feature Engineering](#feature-engineering)
4. [Model Training](#model-training)
5. [Backtesting](#backtesting)
6. [Live Trading Setup](#live-trading-setup)
7. [Risk Management Configuration](#risk-management-configuration)
8. [Monitoring and Maintenance](#monitoring-and-maintenance)
9. [Troubleshooting](#troubleshooting)
10. [Performance Optimization](#performance-optimization)
11. [Future Enhancements](#future-enhancements)

## 1. Immediate Setup

### 1.1 Environment Setup

```bash
# 1. Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\\Scripts\\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Verify installation
python -c "import torch, pandas, backtrader; print('All dependencies installed successfully')"
```

### 1.2 Configuration Setup

```bash
# 1. Copy environment template
cp .env.example .env

# 2. Edit .env with your API credentials
nano .env  # or your preferred editor

# Required for live trading:
# API_KEY_ALPACA=your_key_here
# API_SECRET_ALPACA=your_secret_here
```

### 1.3 Project Structure Verification

```bash
# Verify all files are present
ls -la
# Should show: config/, src/, notebooks/, docs/, requirements.txt, etc.

# Test configuration loading
python -c "from src.utils.config import get_config; config = get_config(); print('Config loaded successfully')"
```

## 2. Data Collection

### 2.1 Choose Your Data Sources

**For Development/Research:**
- Use CSV files with historical data
- Download from Yahoo Finance, Alpha Vantage, or similar

**For Live Trading:**
- Alpaca (US stocks/crypto)
- Binance (crypto)
- OANDA (forex)

### 2.2 Collect Historical Data

**Option A: CSV Files (Recommended for initial development)**

```bash
# Create data/raw directory
mkdir -p data/raw

# Download sample data (replace with your data source)
# Example: EURUSD 1h data from 2020-2024
# Place CSV files in data/raw/ with format: SYMBOL_TIMEFRAME.csv
# Columns: timestamp,open,high,low,close,volume

# Verify data format
python -c "
import pandas as pd
df = pd.read_csv('data/raw/EURUSD_1h.csv')
print(f'Data shape: {df.shape}')
print(f'Columns: {list(df.columns)}')
print(f'Date range: {df.timestamp.min()} to {df.timestamp.max()}')
"
```

**Option B: Alpaca API (Live data)**

```bash
# Set API credentials in .env first
# Then load data via CLI
python main.py data load \
    --symbols EURUSD BTCUSD \
    --timeframes 15m 30m 1h 2h \
    --start 2020-01-01 \
    --end 2024-12-31 \
    --source alpaca
```

### 2.3 Data Quality Checks

```python
# Run data quality analysis
import pandas as pd
from src.data.loaders import load_ohlcv_data

# Load and inspect data
data = load_ohlcv_data(
    symbols=['EURUSD'],
    timeframes=['1h'],
    start_date='2020-01-01',
    end_date='2024-12-31',
    source='csv'
)

df = data['EURUSD']['1h']
print(f"Missing values: {df.isnull().sum().sum()}")
print(f"Duplicate timestamps: {df.index.duplicated().sum()}")
print(f"Price gaps: {(df['close'] - df['close'].shift(1)).abs().max()}")
```

## 3. Feature Engineering

### 3.1 Build Training Dataset

```bash
# Build features from loaded data
python main.py features build \
    --config config/config.yaml \
    --data-path data/loaded_data.pkl \
    --output data/features.pkl \
    --sequence-length 60
```

### 3.2 Feature Analysis

```python
# Analyze built features
import pickle
with open('data/features.pkl', 'rb') as f:
    X, y = pickle.load(f)

print(f"Feature matrix shape: {X.shape}")
print(f"Target shape: {len(y)}")
print(f"Number of features: {X.shape[2]}")

# Check feature distributions
import matplotlib.pyplot as plt
plt.figure(figsize=(15, 10))
for i in range(min(10, X.shape[2])):
    plt.subplot(2, 5, i+1)
    plt.hist(X[:, 0, i], bins=50)
    plt.title(f'Feature {i}')
plt.tight_layout()
plt.show()
```

## 4. Model Training

### 4.1 Hyperparameter Optimization

```bash
# Run Optuna optimization (this takes time!)
python main.py training train \
    --config config/config.yaml \
    --data-path data/features.pkl \
    --output models/best_model.pth \
    --optuna 50
```

**Expected Output:**
```
Trial 49 finished with value: -0.2345 and parameters: {...}
Best trial: 42
Best value: -0.1234
Best parameters: {'hidden_size': 128, 'num_layers': 2, 'dropout': 0.2, ...}
Model training completed. Saved to models/best_model.pth
```

### 4.2 Manual Training (if needed)

```python
# For custom training without Optuna
from src.training.train import train_model
import pickle

with open('data/features.pkl', 'rb') as f:
    X, y = pickle.load(f)

model = train_model(
    X=X,
    y=y,
    config_path='config/config.yaml',
    save_path='models/manual_model.pth'
)
```

### 4.3 Model Validation

```python
# Load and test model
from src.models.lstm_fusion import load_model
import torch

model = load_model('models/best_model.pth')
model.eval()

# Test prediction
with torch.no_grad():
    sample_input = torch.randn(1, 60, model.input_size)
    prediction = model(sample_input)
    print(f"Sample prediction shape: {prediction.shape}")
```

## 5. Backtesting

### 5.1 Run Comprehensive Backtest

```bash
# Run backtest with the trained model
python main.py backtest run \
    --config config/config.yaml \
    --data-path data/test_data.pkl \
    --model-path models/best_model.pth \
    --report reports/backtest_results.html
```

### 5.2 Analyze Results

**Key Metrics to Evaluate:**
- **Sharpe Ratio**: > 1.0 is good, > 2.0 is excellent
- **Max Drawdown**: < 15% is acceptable, < 10% is good
- **Win Rate**: > 50% is decent, > 60% is good
- **Profit Factor**: > 1.5 is good, > 2.0 is excellent

```python
# Load and analyze backtest results
import pickle
with open('reports/backtest_results.pkl', 'rb') as f:
    results = pickle.load(f)

print("Backtest Summary:")
print(f"Total Return: {results['portfolio']['total_return']:.2%}")
print(f"Sharpe Ratio: {results['sharpe_ratio']:.3f}")
print(f"Max Drawdown: {results['max_drawdown']:.2%}")
print(f"Total Trades: {results['num_trades']}")
print(f"Win Rate: {results['winning_trades']/results['num_trades']:.2%}")
```

### 5.3 Risk-Adjusted Performance

```python
# Calculate risk metrics
from src.training.metrics import TradingMetrics

metrics = TradingMetrics()
portfolio_returns = [trade['pnl'] for trade in results['trades']]

risk_metrics = metrics.calculate_trade_statistics(portfolio_returns)
print(f"Calmar Ratio: {risk_metrics.get('calmar_ratio', 0):.3f}")
print(f"Sortino Ratio: {risk_metrics.get('sortino_ratio', 0):.3f}")
```

## 6. Live Trading Setup

### 6.1 Paper Trading (Recommended First)

```bash
# Start paper trading for validation
python main.py live run \
    --config config/config.yaml \
    --model-path models/best_model.pth \
    --mode paper \
    --api-key $API_KEY_ALPACA \
    --api-secret $API_SECRET_ALPACA
```

**Monitor in another terminal:**
```bash
# Check logs
tail -f logs/trading_bot.log

# Check portfolio value
# The bot will log performance every execution cycle
```

### 6.2 Risk Management Configuration

Before going live, adjust risk parameters:

```yaml
# config/config.yaml - Risk Section
backtest:
  risk:
    max_position_size: 0.02  # 2% per trade
    max_positions: 3         # Conservative for live
    stop_loss: 0.015         # 1.5% stop loss
    take_profit: 0.03        # 3% take profit
    circuit_breaker_drawdown: 0.10  # 10% circuit breaker
```

### 6.3 Live Trading (Production)

```bash
# Only after successful paper trading!
python main.py live run \
    --config config/config.yaml \
    --model-path models/best_model.pth \
    --mode live \
    --api-key $API_KEY_ALPACA \
    --api-secret $API_SECRET_ALPACA
```

## 7. Risk Management Configuration

### 7.1 Position Sizing Strategies

**Conservative (Recommended for beginners):**
```python
# Fixed percentage sizing
position_size = portfolio_value * 0.02  # 2% per trade
```

**Advanced (Kelly Criterion):**
```python
# Calculate optimal position size
win_rate = 0.55  # From backtesting
avg_win = 0.02   # 2% average win
avg_loss = 0.01  # 1% average loss

kelly_pct = (win_rate * avg_win - (1 - win_rate) * avg_loss) / avg_loss
position_size = portfolio_value * min(kelly_pct, 0.05)  # Cap at 5%
```

### 7.2 Circuit Breakers

**Daily Loss Limit:**
- Stop trading if daily loss > 3%
- Reset at midnight UTC

**Drawdown Protection:**
- Stop trading if portfolio drops > 15% from peak
- Manual reset required

**Position Limits:**
- Maximum 5 concurrent positions
- Maximum 2% of portfolio per position

## 8. Monitoring and Maintenance

### 8.1 Real-time Monitoring

```bash
# Monitor logs
tail -f logs/trading_bot.log | grep -E "(Trade|Signal|Risk|Error)"

# Monitor performance
python -c "
import json
with open('logs/trading_bot.log', 'r') as f:
    for line in f:
        if 'portfolio_value' in line:
            print(line.strip())
"
```

### 8.2 Daily Health Checks

```python
# Check system health
import subprocess
import sys

def health_check():
    # Check if model files exist
    assert os.path.exists('models/best_model.pth'), "Model file missing"

    # Check if data is recent
    data_files = glob.glob('data/cache/*.parquet')
    assert len(data_files) > 0, "No cached data found"

    # Check API connectivity
    try:
        # Test API call
        pass
    except:
        print("API connectivity issue")

    print("All health checks passed!")

health_check()
```

### 8.3 Model Retraining Schedule

**Weekly Retraining:**
```bash
# Every Sunday at 2 AM
# Add to crontab:
# 0 2 * * 0 /path/to/venv/bin/python /path/to/project/scripts/retrain_model.py
```

**Performance Monitoring:**
- Track live Sharpe ratio vs backtested
- Monitor maximum drawdown
- Alert on significant performance degradation

## 9. Troubleshooting

### 9.1 Common Issues

**Issue: Model not training**
```python
# Check data shapes
import pickle
with open('data/features.pkl', 'rb') as f:
    X, y = pickle.load(f)

print(f"X shape: {X.shape}")  # Should be (samples, sequence_length, features)
print(f"y shape: {len(y)}")    # Should match samples
```

**Issue: Poor backtest performance**
```python
# Check for overfitting
# Compare train/validation metrics
# Reduce model complexity if needed
# Increase regularization (dropout, L2)
```

**Issue: Live trading connection errors**
```python
# Check API credentials
python -c "
import os
print('API Key:', os.getenv('API_KEY_ALPACA')[:10] + '...' if os.getenv('API_KEY_ALPACA') else 'Not set')
print('API Secret:', os.getenv('API_SECRET_ALPACA')[:10] + '...' if os.getenv('API_SECRET_ALPACA') else 'Not set')
"

# Test broker connection
from src.live.broker_alpaca import AlpacaBroker
broker = AlpacaBroker()
print("Connected:", broker.is_connected())
```

### 9.2 Debugging Tools

**Feature Debugging:**
```python
# Check feature distributions
import seaborn as sns
import matplotlib.pyplot as plt

features_df = pd.DataFrame(X[:, 0, :])  # First timestep
plt.figure(figsize=(15, 10))
sns.boxplot(data=features_df)
plt.xticks(rotation=45)
plt.show()
```

**Model Debugging:**
```python
# Check model predictions
model.eval()
with torch.no_grad():
    sample = torch.randn(1, 60, model.input_size)
    pred = model(sample)
    print(f"Prediction: {pred.item():.4f}")
```

## 10. Performance Optimization

### 10.1 Training Optimization

```python
# Use GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

# Optimize batch size
# Start with 32, increase until memory limit
batch_size = 64  # Adjust based on your hardware
```

### 10.2 Inference Optimization

```python
# Model quantization for faster inference
import torch.quantization

# Quantize model
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

# Save quantized model
torch.save(quantized_model.state_dict(), 'models/quantized_model.pth')
```

### 10.3 Memory Optimization

```python
# Use mixed precision training
from torch.cuda.amp import GradScaler, autocast

scaler = GradScaler()

# In training loop:
with autocast():
    outputs = model(inputs)
    loss = criterion(outputs, targets)

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
```

## 11. Future Enhancements

### 11.1 Advanced Features

**Ensemble Models:**
```python
# Combine multiple models
models = [
    load_model('models/model_v1.pth'),
    load_model('models/model_v2.pth'),
    load_model('models/model_v3.pth')
]

# Ensemble prediction
predictions = [model(input_tensor) for model in models]
ensemble_pred = torch.mean(torch.stack(predictions), dim=0)
```

**Reinforcement Learning:**
```python
# Future enhancement: Train with RL
# Use portfolio returns as rewards
# Implement actor-critic or PPO algorithms
```

### 11.2 Multi-Asset Trading

```python
# Extend to multiple assets
symbols = ['EURUSD', 'GBPUSD', 'USDJPY', 'BTCUSD', 'ETHUSD']

# Train separate models per asset or shared model
# Implement portfolio optimization
```

### 11.3 Advanced Risk Management

**Dynamic Position Sizing:**
```python
# Adjust position size based on market volatility
volatility = calculate_market_volatility()
position_size = base_size * (1 / (1 + volatility * 10))
```

**Correlation-Based Risk:**
```python
# Avoid correlated positions
correlation_matrix = calculate_correlations(positions)
if correlation_matrix.max() > 0.7:  # High correlation
    reduce_position_sizes()
```

## 12. Deployment Checklist

### Before Going Live:

- [ ] Backtesting shows consistent profitability (>6 months)
- [ ] Sharpe ratio > 1.0 across multiple market conditions
- [ ] Maximum drawdown < 20% in backtests
- [ ] Paper trading successful for >1 month
- [ ] Risk management parameters tested and validated
- [ ] Monitoring and alerting systems in place
- [ ] Emergency stop procedures documented
- [ ] API rate limits and costs understood

### Production Deployment:

- [ ] Server setup with 24/7 uptime
- [ ] Database backups configured
- [ ] Log aggregation and monitoring
- [ ] Automated model retraining pipeline
- [ ] Performance tracking dashboard
- [ ] Incident response procedures

## 13. Resources and Support

### Documentation:
- [README.md](README.md) - Project overview and setup
- [AGENTS.md](AGENTS.md) - Development guidelines
- [API Documentation](docs/) - Detailed API reference

### Notebooks:
- [01_data_eda.ipynb](notebooks/01_data_eda.ipynb) - Data exploration
- [02_train_optuna.ipynb](notebooks/02_train_optuna.ipynb) - Model training
- [03_backtest.ipynb](notebooks/03_backtest.ipynb) - Strategy evaluation
- [04_live_trading.ipynb](notebooks/04_live_trading.ipynb) - Live trading

### Monitoring:
- Check `logs/trading_bot.log` for system status
- Monitor portfolio value and risk metrics
- Set up alerts for critical events

---

## 🎯 Success Metrics

**Target Performance:**
- **Sharpe Ratio**: > 1.5
- **Max Drawdown**: < 15%
- **Win Rate**: > 55%
- **Profit Factor**: > 1.8

**Risk Management:**
- **Daily Loss Limit**: < 3%
- **Position Concentration**: < 5% per trade
- **Circuit Breaker Response**: < 30 seconds

**Operational:**
- **Uptime**: > 99.5%
- **Model Accuracy**: > 90% of backtested performance
- **Response Time**: < 1 second for signals

---

**Happy Trading! 🚀📈**

*This guide will evolve as you gain experience with the system. Start conservatively, monitor closely, and gradually increase complexity and position sizes as you validate performance.*