autobroadcaster/PRODUCTION.md

876 lines
19 KiB
Markdown

# Production Deployment Guide
Complete guide for running the Movie Scheduler in production environments.
## Table of Contents
1. [Prerequisites](#prerequisites)
2. [Deployment Options](#deployment-options)
3. [Installation Methods](#installation-methods)
4. [Configuration](#configuration)
5. [Security](#security)
6. [Monitoring](#monitoring)
7. [Backup & Recovery](#backup--recovery)
8. [Maintenance](#maintenance)
9. [Troubleshooting](#troubleshooting)
---
## Prerequisites
### Hardware Requirements
**Minimum:**
- CPU: 4 cores (for whisper.cpp and encoding)
- RAM: 4GB
- Storage: 100GB+ (depends on video library size)
- GPU: Intel/AMD with VAAPI support (optional but recommended)
**Recommended:**
- CPU: 8+ cores
- RAM: 8GB+
- Storage: 500GB+ SSD
- GPU: Modern Intel/AMD GPU with VAAPI
### Software Requirements
- **OS**: Linux (Ubuntu 20.04+, Debian 11+, RHEL 8+, or compatible)
- **Python**: 3.7+
- **FFmpeg**: With VAAPI support
- **whisper.cpp**: Compiled and in PATH
- **Network**: Stable connection to NocoDB and RTMP server
---
## Deployment Options
### Option 1: Systemd Service (Recommended for bare metal)
✅ Direct hardware access (best VAAPI performance)
✅ Low overhead
✅ System integration
❌ Manual dependency management
### Option 2: Docker Container (Recommended for most users)
✅ Isolated environment
✅ Easy updates
✅ Portable configuration
⚠️ Slight performance overhead
⚠️ Requires GPU passthrough for VAAPI
### Option 3: Kubernetes/Orchestration
✅ High availability
✅ Auto-scaling
✅ Cloud-native
❌ Complex setup
❌ Overkill for single-instance deployment
---
## Installation Methods
### Method 1: Systemd Service Installation
#### 1. Create Scheduler User
```bash
# Create dedicated user for security
sudo useradd -r -s /bin/bash -d /opt/scheduler -m scheduler
# Add to video group for GPU access
sudo usermod -aG video,render scheduler
```
#### 2. Install Dependencies
```bash
# Install system packages
sudo apt-get update
sudo apt-get install -y python3 python3-pip python3-venv ffmpeg git build-essential
# Install whisper.cpp
sudo -u scheduler git clone https://github.com/ggerganov/whisper.cpp.git /tmp/whisper.cpp
cd /tmp/whisper.cpp
make
sudo cp main /usr/local/bin/whisper.cpp
sudo chmod +x /usr/local/bin/whisper.cpp
# Download whisper model
sudo mkdir -p /opt/models
cd /opt/models
sudo wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
sudo mv ggml-base.en.bin ggml-base.bin
sudo chown -R scheduler:scheduler /opt/models
```
#### 3. Deploy Application
```bash
# Create application directory
sudo mkdir -p /opt/scheduler
sudo chown scheduler:scheduler /opt/scheduler
# Copy application files
sudo -u scheduler cp agent.py /opt/scheduler/
sudo -u scheduler cp requirements.txt /opt/scheduler/
# Create Python virtual environment
sudo -u scheduler python3 -m venv /opt/scheduler/venv
sudo -u scheduler /opt/scheduler/venv/bin/pip install -r /opt/scheduler/requirements.txt
```
#### 4. Configure Storage
```bash
# Create storage directories (adjust paths as needed)
sudo mkdir -p /mnt/storage/raw_movies
sudo mkdir -p /mnt/storage/final_movies
sudo chown -R scheduler:scheduler /mnt/storage
```
#### 5. Configure Service
```bash
# Copy service file
sudo cp scheduler.service /etc/systemd/system/
# Create environment file with secrets
sudo mkdir -p /etc/scheduler
sudo nano /etc/scheduler/scheduler.env
```
Edit `/etc/scheduler/scheduler.env`:
```bash
NOCODB_URL=https://your-nocodb.com/api/v2/tables/YOUR_TABLE_ID/records
NOCODB_TOKEN=your_production_token
RTMP_SERVER=rtmp://your-rtmp-server.com/live/stream
RAW_DIR=/mnt/storage/raw_movies
FINAL_DIR=/mnt/storage/final_movies
WHISPER_MODEL=/opt/models/ggml-base.bin
```
Update `scheduler.service` to use the environment file:
```ini
# Replace Environment= lines with:
EnvironmentFile=/etc/scheduler/scheduler.env
```
#### 6. Enable and Start Service
```bash
# Reload systemd
sudo systemctl daemon-reload
# Enable service (start on boot)
sudo systemctl enable scheduler
# Start service
sudo systemctl start scheduler
# Check status
sudo systemctl status scheduler
# View logs
sudo journalctl -u scheduler -f
```
---
### Method 2: Docker Deployment
#### 1. Prepare Environment
```bash
# Create project directory
mkdir -p /opt/scheduler
cd /opt/scheduler
# Copy application files
cp agent.py requirements.txt Dockerfile docker-compose.prod.yml ./
# Create production environment file
cp .env.production.example .env.production
nano .env.production # Fill in your values
```
#### 2. Configure Storage
```bash
# Ensure storage directories exist
mkdir -p /mnt/storage/raw_movies
mkdir -p /mnt/storage/final_movies
# Download whisper model
mkdir -p /opt/models
cd /opt/models
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
mv ggml-base.en.bin ggml-base.bin
```
#### 3. Deploy Container
```bash
cd /opt/scheduler
# Build image
docker compose -f docker-compose.prod.yml build
# Start service
docker compose -f docker-compose.prod.yml up -d
# Check logs
docker compose -f docker-compose.prod.yml logs -f
# Check status
docker compose -f docker-compose.prod.yml ps
```
#### 4. Enable Auto-Start
```bash
# Create systemd service for docker compose
sudo nano /etc/systemd/system/scheduler-docker.service
```
```ini
[Unit]
Description=Movie Scheduler (Docker)
Requires=docker.service
After=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/scheduler
ExecStart=/usr/bin/docker compose -f docker-compose.prod.yml up -d
ExecStop=/usr/bin/docker compose -f docker-compose.prod.yml down
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
```
```bash
sudo systemctl daemon-reload
sudo systemctl enable scheduler-docker
```
---
## Configuration
### Essential Configuration
#### NocoDB Connection
```bash
# Get your table ID from NocoDB URL
# https://nocodb.com/nc/YOUR_BASE_ID/table_NAME
# API endpoint: https://nocodb.com/api/v2/tables/TABLE_ID/records
# Generate API token in NocoDB:
# Account Settings → Tokens → Create Token
```
#### RTMP Server
```bash
# For nginx-rtmp:
RTMP_SERVER=rtmp://your-server.com:1935/live/stream
# For other RTMP servers, use their endpoint format
```
#### Storage Paths
```bash
# Use separate fast storage for final videos (streaming)
RAW_DIR=/mnt/storage/raw_movies # Can be slower storage
FINAL_DIR=/mnt/fast-storage/final # Should be fast SSD
# Ensure proper permissions
chown -R scheduler:scheduler /mnt/storage
chmod 755 /mnt/storage/raw_movies
chmod 755 /mnt/fast-storage/final
```
### Performance Tuning
#### Sync Intervals
```bash
# High-load scenario (many jobs, frequent updates)
NOCODB_SYNC_INTERVAL_SECONDS=120 # Check less often
WATCHDOG_CHECK_INTERVAL_SECONDS=15 # Check streams less often
# Low-latency scenario (need fast response)
NOCODB_SYNC_INTERVAL_SECONDS=30
WATCHDOG_CHECK_INTERVAL_SECONDS=5
# Default (balanced)
NOCODB_SYNC_INTERVAL_SECONDS=60
WATCHDOG_CHECK_INTERVAL_SECONDS=10
```
#### FFmpeg VAAPI
```bash
# Find your VAAPI device
ls -la /dev/dri/
# Common devices:
# renderD128 - Primary GPU
# renderD129 - Secondary GPU
# Test VAAPI
ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i test.mp4 -f null -
# Set in config
VAAPI_DEVICE=/dev/dri/renderD128
```
---
## Security
### Secrets Management
**DO NOT hardcode secrets in files tracked by git!**
#### Option 1: Environment Files (Simple)
```bash
# Store secrets in protected file
sudo nano /etc/scheduler/scheduler.env
sudo chmod 600 /etc/scheduler/scheduler.env
sudo chown scheduler:scheduler /etc/scheduler/scheduler.env
```
#### Option 2: Secrets Management Tools
```bash
# Using Vault
export NOCODB_TOKEN=$(vault kv get -field=token secret/scheduler/nocodb)
# Using AWS Secrets Manager
export NOCODB_TOKEN=$(aws secretsmanager get-secret-value --secret-id scheduler/nocodb --query SecretString --output text)
# Using Docker Secrets (Swarm/Kubernetes)
# Mount secrets as files and read in application
```
### Filesystem Permissions
```bash
# Application directory
chown -R scheduler:scheduler /opt/scheduler
chmod 750 /opt/scheduler
# Storage directories
chown -R scheduler:scheduler /mnt/storage
chmod 755 /mnt/storage/raw_movies # Read-only for scheduler
chmod 755 /mnt/storage/final_movies # Read-write for scheduler
# Database file
chmod 600 /opt/scheduler/scheduler.db
chown scheduler:scheduler /opt/scheduler/scheduler.db
```
### Network Security
```bash
# Firewall rules (if scheduler runs on separate server)
# Only allow outbound connections to NocoDB and RTMP
sudo ufw allow out to YOUR_NOCODB_IP port 443 # HTTPS
sudo ufw allow out to YOUR_RTMP_IP port 1935 # RTMP
sudo ufw default deny outgoing # Deny all other outbound (optional)
```
### Regular Updates
```bash
# Update system packages weekly
sudo apt-get update && sudo apt-get upgrade
# Update Python dependencies
sudo -u scheduler /opt/scheduler/venv/bin/pip install --upgrade requests
# Rebuild whisper.cpp quarterly (for performance improvements)
```
---
## Monitoring
### Service Health
#### Systemd Monitoring
```bash
# Check service status
systemctl status scheduler
# View recent logs
journalctl -u scheduler -n 100
# Follow logs in real-time
journalctl -u scheduler -f
# Check for errors in last hour
journalctl -u scheduler --since "1 hour ago" | grep ERROR
# Service restart count
systemctl show scheduler | grep NRestarts
```
#### Docker Monitoring
```bash
# Container status
docker compose -f docker-compose.prod.yml ps
# Resource usage
docker stats movie_scheduler
# Logs
docker compose -f docker-compose.prod.yml logs --tail=100 -f
# Health check status
docker inspect movie_scheduler | jq '.[0].State.Health'
```
### Database Monitoring
```bash
# Check job status
sqlite3 /opt/scheduler/scheduler.db "SELECT prep_status, play_status, COUNT(*) FROM jobs GROUP BY prep_status, play_status;"
# Active streams
sqlite3 /opt/scheduler/scheduler.db "SELECT nocodb_id, title, play_status, stream_retry_count FROM jobs WHERE play_status='streaming';"
# Failed jobs
sqlite3 /opt/scheduler/scheduler.db "SELECT nocodb_id, title, prep_status, play_status, log FROM jobs WHERE prep_status='failed' OR play_status='failed';"
# Database size
ls -lh /opt/scheduler/scheduler.db
```
### Automated Monitoring Script
Create `/opt/scheduler/monitor.sh`:
```bash
#!/bin/bash
LOG_FILE="/var/log/scheduler-monitor.log"
DB_PATH="/opt/scheduler/scheduler.db"
echo "=== Scheduler Monitor - $(date) ===" >> "$LOG_FILE"
# Check if service is running
if systemctl is-active --quiet scheduler; then
echo "✓ Service is running" >> "$LOG_FILE"
else
echo "✗ Service is DOWN" >> "$LOG_FILE"
# Send alert (email, Slack, etc.)
systemctl start scheduler
fi
# Check for failed jobs
FAILED=$(sqlite3 "$DB_PATH" "SELECT COUNT(*) FROM jobs WHERE prep_status='failed' OR play_status='failed';")
if [ "$FAILED" -gt 0 ]; then
echo "⚠ Found $FAILED failed jobs" >> "$LOG_FILE"
# Send alert
fi
# Check active streams
STREAMING=$(sqlite3 "$DB_PATH" "SELECT COUNT(*) FROM jobs WHERE play_status='streaming';")
echo "Active streams: $STREAMING" >> "$LOG_FILE"
# Check disk space
DISK_USAGE=$(df -h /mnt/storage | tail -1 | awk '{print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -gt 90 ]; then
echo "⚠ Disk usage is ${DISK_USAGE}%" >> "$LOG_FILE"
# Send alert
fi
echo "" >> "$LOG_FILE"
```
```bash
# Make executable
chmod +x /opt/scheduler/monitor.sh
# Add to crontab (check every 5 minutes)
(crontab -l 2>/dev/null; echo "*/5 * * * * /opt/scheduler/monitor.sh") | crontab -
```
### External Monitoring
#### Prometheus + Grafana
Export metrics using node_exporter or custom exporter:
```bash
# Install node_exporter for system metrics
# Create custom exporter for job metrics from database
# Set up Grafana dashboard
```
#### Uptime Monitoring
Use services like:
- UptimeRobot
- Pingdom
- Datadog
Monitor:
- Service availability
- RTMP server connectivity
- NocoDB API accessibility
---
## Backup & Recovery
### What to Backup
1. **Database** (scheduler.db) - Critical
2. **Configuration** (.env.production or /etc/scheduler/scheduler.env) - Critical
3. **Final videos** (if you want to keep processed videos)
4. **Logs** (optional, for forensics)
### Backup Script
Create `/opt/scheduler/backup.sh`:
```bash
#!/bin/bash
BACKUP_DIR="/backup/scheduler"
DATE=$(date +%Y%m%d_%H%M%S)
DB_PATH="/opt/scheduler/scheduler.db"
CONFIG_PATH="/etc/scheduler/scheduler.env"
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Backup database
cp "$DB_PATH" "$BACKUP_DIR/scheduler_${DATE}.db"
# Backup config (careful with secrets!)
cp "$CONFIG_PATH" "$BACKUP_DIR/config_${DATE}.env"
# Compress old backups
find "$BACKUP_DIR" -name "*.db" -mtime +7 -exec gzip {} \;
# Delete backups older than 30 days
find "$BACKUP_DIR" -name "*.db.gz" -mtime +30 -delete
find "$BACKUP_DIR" -name "*.env" -mtime +30 -delete
# Optional: Upload to S3/cloud storage
# aws s3 sync "$BACKUP_DIR" s3://your-bucket/scheduler-backups/
echo "Backup completed: $BACKUP_DIR/scheduler_${DATE}.db"
```
```bash
# Make executable
chmod +x /opt/scheduler/backup.sh
# Run daily at 2 AM
(crontab -l 2>/dev/null; echo "0 2 * * * /opt/scheduler/backup.sh") | crontab -
```
### Recovery Procedure
#### 1. Restore from Backup
```bash
# Stop service
sudo systemctl stop scheduler
# Restore database
cp /backup/scheduler/scheduler_YYYYMMDD_HHMMSS.db /opt/scheduler/scheduler.db
chown scheduler:scheduler /opt/scheduler/scheduler.db
# Restore config if needed
cp /backup/scheduler/config_YYYYMMDD_HHMMSS.env /etc/scheduler/scheduler.env
# Start service
sudo systemctl start scheduler
```
#### 2. Disaster Recovery (Full Rebuild)
If server is completely lost:
1. Provision new server
2. Follow installation steps above
3. Restore database and config from backup
4. Restart service
5. Verify jobs are picked up
**Recovery Time Objective (RTO):** 30-60 minutes
**Recovery Point Objective (RPO):** Up to 24 hours (with daily backups)
---
## Maintenance
### Routine Tasks
#### Daily
- ✓ Check service status
- ✓ Review error logs
- ✓ Check failed jobs
#### Weekly
- ✓ Review disk space
- ✓ Check database size
- ✓ Clean up old processed videos (if not needed)
#### Monthly
- ✓ Update system packages
- ✓ Review and optimize database
- ✓ Test backup restoration
- ✓ Review and rotate logs
### Database Maintenance
```bash
# Vacuum database (reclaim space, optimize)
sqlite3 /opt/scheduler/scheduler.db "VACUUM;"
# Analyze database (update statistics)
sqlite3 /opt/scheduler/scheduler.db "ANALYZE;"
# Clean up old completed jobs (optional)
sqlite3 /opt/scheduler/scheduler.db "DELETE FROM jobs WHERE play_status='done' AND datetime(run_at) < datetime('now', '-30 days');"
```
### Log Rotation
For systemd (automatic via journald):
```bash
# Configure in /etc/systemd/journald.conf
SystemMaxUse=1G
RuntimeMaxUse=100M
```
For Docker:
```yaml
# Already configured in docker-compose.prod.yml
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "5"
```
### Video Cleanup
```bash
# Clean up old final videos (adjust retention as needed)
find /mnt/storage/final_movies -name "*.mp4" -mtime +7 -delete
# Or move to archive
find /mnt/storage/final_movies -name "*.mp4" -mtime +7 -exec mv {} /mnt/archive/ \;
```
---
## Troubleshooting
### Service Won't Start
```bash
# Check service status
systemctl status scheduler
# Check logs for errors
journalctl -u scheduler -n 50
# Common issues:
# 1. Missing environment variables
grep -i "error" /var/log/syslog | grep scheduler
# 2. Permission issues
ls -la /opt/scheduler
ls -la /mnt/storage
# 3. GPU access issues
ls -la /dev/dri/
groups scheduler # Should include 'video' and 'render'
```
### Streams Keep Failing
```bash
# Test RTMP server manually
ffmpeg -re -i test.mp4 -c copy -f flv rtmp://your-server/live/stream
# Check network connectivity
ping your-rtmp-server.com
telnet your-rtmp-server.com 1935
# Review stream logs
journalctl -u scheduler | grep -A 10 "Stream crashed"
# Check retry count
sqlite3 /opt/scheduler/scheduler.db "SELECT nocodb_id, title, stream_retry_count FROM jobs WHERE play_status='streaming';"
```
### High CPU/Memory Usage
```bash
# Check resource usage
top -u scheduler
# Or for Docker
docker stats movie_scheduler
# Common causes:
# 1. Large video file encoding - normal, wait for completion
# 2. whisper.cpp using all cores - normal
# 3. Multiple prep jobs running - adjust or wait
# Limit resources if needed (systemd)
systemctl edit scheduler
# Add:
[Service]
CPUQuota=200%
MemoryMax=4G
```
### Database Locked Errors
```bash
# Check for stale locks
lsof /opt/scheduler/scheduler.db
# Kill stale processes if needed
# Restart service
systemctl restart scheduler
```
### VAAPI Not Working
```bash
# Verify VAAPI support
vainfo
# Test FFmpeg VAAPI
ffmpeg -hwaccels
# Check permissions
ls -la /dev/dri/renderD128
groups scheduler # Should include 'video' or 'render'
# Fallback to software encoding
# Comment out VAAPI_DEVICE in config
# Encoding will use CPU (slower but works)
```
---
## Performance Optimization
### Hardware Acceleration
```bash
# Verify GPU usage during encoding
intel_gpu_top # For Intel GPUs
radeontop # For AMD GPUs
# If GPU not being used, check:
# 1. VAAPI device path correct
# 2. User has GPU permissions
# 3. FFmpeg compiled with VAAPI support
```
### Storage Performance
```bash
# Use SSD for final videos (they're streamed frequently)
# Use HDD for raw videos (accessed once for processing)
# Test disk performance
dd if=/dev/zero of=/mnt/storage/test bs=1M count=1024 oflag=direct
rm /mnt/storage/test
```
### Network Optimization
```bash
# For better streaming reliability
# 1. Use dedicated network for RTMP
# 2. Enable QoS for streaming traffic
# 3. Consider local RTMP relay
# Test network throughput
iperf3 -c your-rtmp-server.com
```
---
## Production Checklist
Before going live:
- [ ] Secrets stored securely (not in git)
- [ ] Service auto-starts on boot
- [ ] Backups configured and tested
- [ ] Monitoring configured
- [ ] Logs being rotated
- [ ] Disk space alerts configured
- [ ] Test recovery procedure
- [ ] Document runbook for on-call
- [ ] GPU permissions verified
- [ ] RTMP connectivity tested
- [ ] NocoDB API tested
- [ ] Process one test video end-to-end
- [ ] Verify streaming watchdog works
- [ ] Test service restart during streaming
- [ ] Configure alerting for failures
---
## Support & Updates
### Getting Updates
```bash
# Git-based deployment
cd /opt/scheduler
git pull origin main
systemctl restart scheduler
# Docker-based deployment
cd /opt/scheduler
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d
```
### Reporting Issues
Include in bug reports:
- Service logs: `journalctl -u scheduler -n 100`
- Database state: `sqlite3 scheduler.db ".dump jobs"`
- System info: `uname -a`, `python3 --version`, `ffmpeg -version`
- Configuration (redact secrets!)
---
## Additional Resources
- FFmpeg VAAPI Guide: https://trac.ffmpeg.org/wiki/Hardware/VAAPI
- whisper.cpp: https://github.com/ggerganov/whisper.cpp
- NocoDB API: https://docs.nocodb.com
- Systemd Documentation: https://www.freedesktop.org/software/systemd/man/
---
## License
This production guide is provided as-is. Test thoroughly in staging before production deployment.