autobroadcaster/PRODUCTION.md

# Production Deployment Guide

Complete guide for running the Movie Scheduler in production environments.

## Table of Contents
1. [Prerequisites](#prerequisites)
2. [Deployment Options](#deployment-options)
3. [Installation Methods](#installation-methods)
4. [Configuration](#configuration)
5. [Security](#security)
6. [Monitoring](#monitoring)
7. [Backup & Recovery](#backup--recovery)
8. [Maintenance](#maintenance)
9. [Troubleshooting](#troubleshooting)

---

## Prerequisites

### Hardware Requirements

**Minimum:**
- CPU: 4 cores (for whisper.cpp and encoding)
- RAM: 4GB
- Storage: 100GB+ (depends on video library size)
- GPU: Intel/AMD with VAAPI support (optional but recommended)

**Recommended:**
- CPU: 8+ cores
- RAM: 8GB+
- Storage: 500GB+ SSD
- GPU: Modern Intel/AMD GPU with VAAPI

### Software Requirements

- **OS**: Linux (Ubuntu 20.04+, Debian 11+, RHEL 8+, or compatible)
- **Python**: 3.7+
- **FFmpeg**: With VAAPI support
- **whisper.cpp**: Compiled and in PATH
- **Network**: Stable connection to NocoDB and RTMP server

---

## Deployment Options

### Option 1: Systemd Service (Recommended for bare metal)
✅ Direct hardware access (best VAAPI performance)
✅ Low overhead
✅ System integration
❌ Manual dependency management

### Option 2: Docker Container (Recommended for most users)
✅ Isolated environment
✅ Easy updates
✅ Portable configuration
⚠️ Slight performance overhead
⚠️ Requires GPU passthrough for VAAPI

### Option 3: Kubernetes/Orchestration
✅ High availability
✅ Auto-scaling
✅ Cloud-native
❌ Complex setup
❌ Overkill for single-instance deployment

---

## Installation Methods

### Method 1: Systemd Service Installation

#### 1. Create Scheduler User

```bash
# Create dedicated user for security
sudo useradd -r -s /bin/bash -d /opt/scheduler -m scheduler

# Add to video group for GPU access
sudo usermod -aG video,render scheduler
```

#### 2. Install Dependencies

```bash
# Install system packages
sudo apt-get update
sudo apt-get install -y python3 python3-pip python3-venv ffmpeg git build-essential

# Install whisper.cpp
sudo -u scheduler git clone https://github.com/ggerganov/whisper.cpp.git /tmp/whisper.cpp
cd /tmp/whisper.cpp
make
sudo cp main /usr/local/bin/whisper.cpp
sudo chmod +x /usr/local/bin/whisper.cpp

# Download whisper model
sudo mkdir -p /opt/models
cd /opt/models
sudo wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
sudo mv ggml-base.en.bin ggml-base.bin
sudo chown -R scheduler:scheduler /opt/models
```

#### 3. Deploy Application

```bash
# Create application directory
sudo mkdir -p /opt/scheduler
sudo chown scheduler:scheduler /opt/scheduler

# Copy application files
sudo -u scheduler cp agent.py /opt/scheduler/
sudo -u scheduler cp requirements.txt /opt/scheduler/

# Create Python virtual environment
sudo -u scheduler python3 -m venv /opt/scheduler/venv
sudo -u scheduler /opt/scheduler/venv/bin/pip install -r /opt/scheduler/requirements.txt
```

#### 4. Configure Storage

```bash
# Create storage directories (adjust paths as needed)
sudo mkdir -p /mnt/storage/raw_movies
sudo mkdir -p /mnt/storage/final_movies
sudo chown -R scheduler:scheduler /mnt/storage
```

#### 5. Configure Service

```bash
# Copy service file
sudo cp scheduler.service /etc/systemd/system/

# Create environment file with secrets
sudo mkdir -p /etc/scheduler
sudo nano /etc/scheduler/scheduler.env
```

Edit `/etc/scheduler/scheduler.env`:
```bash
NOCODB_URL=https://your-nocodb.com/api/v2/tables/YOUR_TABLE_ID/records
NOCODB_TOKEN=your_production_token
RTMP_SERVER=rtmp://your-rtmp-server.com/live/stream
RAW_DIR=/mnt/storage/raw_movies
FINAL_DIR=/mnt/storage/final_movies
WHISPER_MODEL=/opt/models/ggml-base.bin
```

Update `scheduler.service` to use the environment file:
```ini
# Replace Environment= lines with:
EnvironmentFile=/etc/scheduler/scheduler.env
```

#### 6. Enable and Start Service

```bash
# Reload systemd
sudo systemctl daemon-reload

# Enable service (start on boot)
sudo systemctl enable scheduler

# Start service
sudo systemctl start scheduler

# Check status
sudo systemctl status scheduler

# View logs
sudo journalctl -u scheduler -f
```

---

### Method 2: Docker Deployment

#### 1. Prepare Environment

```bash
# Create project directory
mkdir -p /opt/scheduler
cd /opt/scheduler

# Copy application files
cp agent.py requirements.txt Dockerfile docker-compose.prod.yml ./

# Create production environment file
cp .env.production.example .env.production
nano .env.production  # Fill in your values
```

#### 2. Configure Storage

```bash
# Ensure storage directories exist
mkdir -p /mnt/storage/raw_movies
mkdir -p /mnt/storage/final_movies

# Download whisper model
mkdir -p /opt/models
cd /opt/models
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
mv ggml-base.en.bin ggml-base.bin
```

#### 3. Deploy Container

```bash
cd /opt/scheduler

# Build image
docker compose -f docker-compose.prod.yml build

# Start service
docker compose -f docker-compose.prod.yml up -d

# Check logs
docker compose -f docker-compose.prod.yml logs -f

# Check status
docker compose -f docker-compose.prod.yml ps
```

#### 4. Enable Auto-Start

```bash
# Create systemd service for docker compose
sudo nano /etc/systemd/system/scheduler-docker.service
```

```ini
[Unit]
Description=Movie Scheduler (Docker)
Requires=docker.service
After=docker.service

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/scheduler
ExecStart=/usr/bin/docker compose -f docker-compose.prod.yml up -d
ExecStop=/usr/bin/docker compose -f docker-compose.prod.yml down
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target
```

```bash
sudo systemctl daemon-reload
sudo systemctl enable scheduler-docker
```

---

## Configuration

### Essential Configuration

#### NocoDB Connection

```bash
# Get your table ID from NocoDB URL
# https://nocodb.com/nc/YOUR_BASE_ID/table_NAME
# API endpoint: https://nocodb.com/api/v2/tables/TABLE_ID/records

# Generate API token in NocoDB:
# Account Settings → Tokens → Create Token
```

#### RTMP Server

```bash
# For nginx-rtmp:
RTMP_SERVER=rtmp://your-server.com:1935/live/stream

# For other RTMP servers, use their endpoint format
```

#### Storage Paths

```bash
# Use separate fast storage for final videos (streaming)
RAW_DIR=/mnt/storage/raw_movies      # Can be slower storage
FINAL_DIR=/mnt/fast-storage/final    # Should be fast SSD

# Ensure proper permissions
chown -R scheduler:scheduler /mnt/storage
chmod 755 /mnt/storage/raw_movies
chmod 755 /mnt/fast-storage/final
```

### Performance Tuning

#### Sync Intervals

```bash
# High-load scenario (many jobs, frequent updates)
NOCODB_SYNC_INTERVAL_SECONDS=120  # Check less often
WATCHDOG_CHECK_INTERVAL_SECONDS=15  # Check streams less often

# Low-latency scenario (need fast response)
NOCODB_SYNC_INTERVAL_SECONDS=30
WATCHDOG_CHECK_INTERVAL_SECONDS=5

# Default (balanced)
NOCODB_SYNC_INTERVAL_SECONDS=60
WATCHDOG_CHECK_INTERVAL_SECONDS=10
```

#### FFmpeg VAAPI

```bash
# Find your VAAPI device
ls -la /dev/dri/

# Common devices:
# renderD128 - Primary GPU
# renderD129 - Secondary GPU

# Test VAAPI
ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i test.mp4 -f null -

# Set in config
VAAPI_DEVICE=/dev/dri/renderD128
```

---

## Security

### Secrets Management

**DO NOT hardcode secrets in files tracked by git!**

#### Option 1: Environment Files (Simple)

```bash
# Store secrets in protected file
sudo nano /etc/scheduler/scheduler.env
sudo chmod 600 /etc/scheduler/scheduler.env
sudo chown scheduler:scheduler /etc/scheduler/scheduler.env
```

#### Option 2: Secrets Management Tools

```bash
# Using Vault
export NOCODB_TOKEN=$(vault kv get -field=token secret/scheduler/nocodb)

# Using AWS Secrets Manager
export NOCODB_TOKEN=$(aws secretsmanager get-secret-value --secret-id scheduler/nocodb --query SecretString --output text)

# Using Docker Secrets (Swarm/Kubernetes)
# Mount secrets as files and read in application
```

### Filesystem Permissions

```bash
# Application directory
chown -R scheduler:scheduler /opt/scheduler
chmod 750 /opt/scheduler

# Storage directories
chown -R scheduler:scheduler /mnt/storage
chmod 755 /mnt/storage/raw_movies      # Read-only for scheduler
chmod 755 /mnt/storage/final_movies    # Read-write for scheduler

# Database file
chmod 600 /opt/scheduler/scheduler.db
chown scheduler:scheduler /opt/scheduler/scheduler.db
```

### Network Security

```bash
# Firewall rules (if scheduler runs on separate server)
# Only allow outbound connections to NocoDB and RTMP

sudo ufw allow out to YOUR_NOCODB_IP port 443  # HTTPS
sudo ufw allow out to YOUR_RTMP_IP port 1935   # RTMP
sudo ufw default deny outgoing  # Deny all other outbound (optional)
```

### Regular Updates

```bash
# Update system packages weekly
sudo apt-get update && sudo apt-get upgrade

# Update Python dependencies
sudo -u scheduler /opt/scheduler/venv/bin/pip install --upgrade requests

# Rebuild whisper.cpp quarterly (for performance improvements)
```

---

## Monitoring

### Service Health

#### Systemd Monitoring

```bash
# Check service status
systemctl status scheduler

# View recent logs
journalctl -u scheduler -n 100

# Follow logs in real-time
journalctl -u scheduler -f

# Check for errors in last hour
journalctl -u scheduler --since "1 hour ago" | grep ERROR

# Service restart count
systemctl show scheduler | grep NRestarts
```

#### Docker Monitoring

```bash
# Container status
docker compose -f docker-compose.prod.yml ps

# Resource usage
docker stats movie_scheduler

# Logs
docker compose -f docker-compose.prod.yml logs --tail=100 -f

# Health check status
docker inspect movie_scheduler | jq '.[0].State.Health'
```

### Database Monitoring

```bash
# Check job status
sqlite3 /opt/scheduler/scheduler.db "SELECT prep_status, play_status, COUNT(*) FROM jobs GROUP BY prep_status, play_status;"

# Active streams
sqlite3 /opt/scheduler/scheduler.db "SELECT nocodb_id, title, play_status, stream_retry_count FROM jobs WHERE play_status='streaming';"

# Failed jobs
sqlite3 /opt/scheduler/scheduler.db "SELECT nocodb_id, title, prep_status, play_status, log FROM jobs WHERE prep_status='failed' OR play_status='failed';"

# Database size
ls -lh /opt/scheduler/scheduler.db
```

### Automated Monitoring Script

Create `/opt/scheduler/monitor.sh`:

```bash
#!/bin/bash

LOG_FILE="/var/log/scheduler-monitor.log"
DB_PATH="/opt/scheduler/scheduler.db"

echo "=== Scheduler Monitor - $(date) ===" >> "$LOG_FILE"

# Check if service is running
if systemctl is-active --quiet scheduler; then
    echo "✓ Service is running" >> "$LOG_FILE"
else
    echo "✗ Service is DOWN" >> "$LOG_FILE"
    # Send alert (email, Slack, etc.)
    systemctl start scheduler
fi

# Check for failed jobs
FAILED=$(sqlite3 "$DB_PATH" "SELECT COUNT(*) FROM jobs WHERE prep_status='failed' OR play_status='failed';")
if [ "$FAILED" -gt 0 ]; then
    echo "⚠ Found $FAILED failed jobs" >> "$LOG_FILE"
    # Send alert
fi

# Check active streams
STREAMING=$(sqlite3 "$DB_PATH" "SELECT COUNT(*) FROM jobs WHERE play_status='streaming';")
echo "Active streams: $STREAMING" >> "$LOG_FILE"

# Check disk space
DISK_USAGE=$(df -h /mnt/storage | tail -1 | awk '{print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -gt 90 ]; then
    echo "⚠ Disk usage is ${DISK_USAGE}%" >> "$LOG_FILE"
    # Send alert
fi

echo "" >> "$LOG_FILE"
```

```bash
# Make executable
chmod +x /opt/scheduler/monitor.sh

# Add to crontab (check every 5 minutes)
(crontab -l 2>/dev/null; echo "*/5 * * * * /opt/scheduler/monitor.sh") | crontab -
```

### External Monitoring

#### Prometheus + Grafana

Export metrics using node_exporter or custom exporter:

```bash
# Install node_exporter for system metrics
# Create custom exporter for job metrics from database
# Set up Grafana dashboard
```

#### Uptime Monitoring

Use services like:
- UptimeRobot
- Pingdom
- Datadog

Monitor:
- Service availability
- RTMP server connectivity
- NocoDB API accessibility

---

## Backup & Recovery

### What to Backup

1. **Database** (scheduler.db) - Critical
2. **Configuration** (.env.production or /etc/scheduler/scheduler.env) - Critical
3. **Final videos** (if you want to keep processed videos)
4. **Logs** (optional, for forensics)

### Backup Script

Create `/opt/scheduler/backup.sh`:

```bash
#!/bin/bash

BACKUP_DIR="/backup/scheduler"
DATE=$(date +%Y%m%d_%H%M%S)
DB_PATH="/opt/scheduler/scheduler.db"
CONFIG_PATH="/etc/scheduler/scheduler.env"

# Create backup directory
mkdir -p "$BACKUP_DIR"

# Backup database
cp "$DB_PATH" "$BACKUP_DIR/scheduler_${DATE}.db"

# Backup config (careful with secrets!)
cp "$CONFIG_PATH" "$BACKUP_DIR/config_${DATE}.env"

# Compress old backups
find "$BACKUP_DIR" -name "*.db" -mtime +7 -exec gzip {} \;

# Delete backups older than 30 days
find "$BACKUP_DIR" -name "*.db.gz" -mtime +30 -delete
find "$BACKUP_DIR" -name "*.env" -mtime +30 -delete

# Optional: Upload to S3/cloud storage
# aws s3 sync "$BACKUP_DIR" s3://your-bucket/scheduler-backups/

echo "Backup completed: $BACKUP_DIR/scheduler_${DATE}.db"
```

```bash
# Make executable
chmod +x /opt/scheduler/backup.sh

# Run daily at 2 AM
(crontab -l 2>/dev/null; echo "0 2 * * * /opt/scheduler/backup.sh") | crontab -
```

### Recovery Procedure

#### 1. Restore from Backup

```bash
# Stop service
sudo systemctl stop scheduler

# Restore database
cp /backup/scheduler/scheduler_YYYYMMDD_HHMMSS.db /opt/scheduler/scheduler.db
chown scheduler:scheduler /opt/scheduler/scheduler.db

# Restore config if needed
cp /backup/scheduler/config_YYYYMMDD_HHMMSS.env /etc/scheduler/scheduler.env

# Start service
sudo systemctl start scheduler
```

#### 2. Disaster Recovery (Full Rebuild)

If server is completely lost:

1. Provision new server
2. Follow installation steps above
3. Restore database and config from backup
4. Restart service
5. Verify jobs are picked up

**Recovery Time Objective (RTO):** 30-60 minutes
**Recovery Point Objective (RPO):** Up to 24 hours (with daily backups)

---

## Maintenance

### Routine Tasks

#### Daily
- ✓ Check service status
- ✓ Review error logs
- ✓ Check failed jobs

#### Weekly
- ✓ Review disk space
- ✓ Check database size
- ✓ Clean up old processed videos (if not needed)

#### Monthly
- ✓ Update system packages
- ✓ Review and optimize database
- ✓ Test backup restoration
- ✓ Review and rotate logs

### Database Maintenance

```bash
# Vacuum database (reclaim space, optimize)
sqlite3 /opt/scheduler/scheduler.db "VACUUM;"

# Analyze database (update statistics)
sqlite3 /opt/scheduler/scheduler.db "ANALYZE;"

# Clean up old completed jobs (optional)
sqlite3 /opt/scheduler/scheduler.db "DELETE FROM jobs WHERE play_status='done' AND datetime(run_at) < datetime('now', '-30 days');"
```

### Log Rotation

For systemd (automatic via journald):
```bash
# Configure in /etc/systemd/journald.conf
SystemMaxUse=1G
RuntimeMaxUse=100M
```

For Docker:
```yaml
# Already configured in docker-compose.prod.yml
logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "5"
```

### Video Cleanup

```bash
# Clean up old final videos (adjust retention as needed)
find /mnt/storage/final_movies -name "*.mp4" -mtime +7 -delete

# Or move to archive
find /mnt/storage/final_movies -name "*.mp4" -mtime +7 -exec mv {} /mnt/archive/ \;
```

---

## Troubleshooting

### Service Won't Start

```bash
# Check service status
systemctl status scheduler

# Check logs for errors
journalctl -u scheduler -n 50

# Common issues:
# 1. Missing environment variables
grep -i "error" /var/log/syslog | grep scheduler

# 2. Permission issues
ls -la /opt/scheduler
ls -la /mnt/storage

# 3. GPU access issues
ls -la /dev/dri/
groups scheduler  # Should include 'video' and 'render'
```

### Streams Keep Failing

```bash
# Test RTMP server manually
ffmpeg -re -i test.mp4 -c copy -f flv rtmp://your-server/live/stream

# Check network connectivity
ping your-rtmp-server.com
telnet your-rtmp-server.com 1935

# Review stream logs
journalctl -u scheduler | grep -A 10 "Stream crashed"

# Check retry count
sqlite3 /opt/scheduler/scheduler.db "SELECT nocodb_id, title, stream_retry_count FROM jobs WHERE play_status='streaming';"
```

### High CPU/Memory Usage

```bash
# Check resource usage
top -u scheduler

# Or for Docker
docker stats movie_scheduler

# Common causes:
# 1. Large video file encoding - normal, wait for completion
# 2. whisper.cpp using all cores - normal
# 3. Multiple prep jobs running - adjust or wait

# Limit resources if needed (systemd)
systemctl edit scheduler
# Add:
[Service]
CPUQuota=200%
MemoryMax=4G
```

### Database Locked Errors

```bash
# Check for stale locks
lsof /opt/scheduler/scheduler.db

# Kill stale processes if needed
# Restart service
systemctl restart scheduler
```

### VAAPI Not Working

```bash
# Verify VAAPI support
vainfo

# Test FFmpeg VAAPI
ffmpeg -hwaccels

# Check permissions
ls -la /dev/dri/renderD128
groups scheduler  # Should include 'video' or 'render'

# Fallback to software encoding
# Comment out VAAPI_DEVICE in config
# Encoding will use CPU (slower but works)
```

---

## Performance Optimization

### Hardware Acceleration

```bash
# Verify GPU usage during encoding
intel_gpu_top  # For Intel GPUs
radeontop      # For AMD GPUs

# If GPU not being used, check:
# 1. VAAPI device path correct
# 2. User has GPU permissions
# 3. FFmpeg compiled with VAAPI support
```

### Storage Performance

```bash
# Use SSD for final videos (they're streamed frequently)
# Use HDD for raw videos (accessed once for processing)

# Test disk performance
dd if=/dev/zero of=/mnt/storage/test bs=1M count=1024 oflag=direct
rm /mnt/storage/test
```

### Network Optimization

```bash
# For better streaming reliability
# 1. Use dedicated network for RTMP
# 2. Enable QoS for streaming traffic
# 3. Consider local RTMP relay

# Test network throughput
iperf3 -c your-rtmp-server.com
```

---

## Production Checklist

Before going live:

- [ ] Secrets stored securely (not in git)
- [ ] Service auto-starts on boot
- [ ] Backups configured and tested
- [ ] Monitoring configured
- [ ] Logs being rotated
- [ ] Disk space alerts configured
- [ ] Test recovery procedure
- [ ] Document runbook for on-call
- [ ] GPU permissions verified
- [ ] RTMP connectivity tested
- [ ] NocoDB API tested
- [ ] Process one test video end-to-end
- [ ] Verify streaming watchdog works
- [ ] Test service restart during streaming
- [ ] Configure alerting for failures

---

## Support & Updates

### Getting Updates

```bash
# Git-based deployment
cd /opt/scheduler
git pull origin main
systemctl restart scheduler

# Docker-based deployment
cd /opt/scheduler
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d
```

### Reporting Issues

Include in bug reports:
- Service logs: `journalctl -u scheduler -n 100`
- Database state: `sqlite3 scheduler.db ".dump jobs"`
- System info: `uname -a`, `python3 --version`, `ffmpeg -version`
- Configuration (redact secrets!)

---

## Additional Resources

- FFmpeg VAAPI Guide: https://trac.ffmpeg.org/wiki/Hardware/VAAPI
- whisper.cpp: https://github.com/ggerganov/whisper.cpp
- NocoDB API: https://docs.nocodb.com
- Systemd Documentation: https://www.freedesktop.org/software/systemd/man/

---

## License

This production guide is provided as-is. Test thoroughly in staging before production deployment.