autobroadcaster/PRODUCTION.md

19 KiB

Production Deployment Guide

Complete guide for running the Movie Scheduler in production environments.

Table of Contents

  1. Prerequisites
  2. Deployment Options
  3. Installation Methods
  4. Configuration
  5. Security
  6. Monitoring
  7. Backup & Recovery
  8. Maintenance
  9. Troubleshooting

Prerequisites

Hardware Requirements

Minimum:

  • CPU: 4 cores (for whisper.cpp and encoding)
  • RAM: 4GB
  • Storage: 100GB+ (depends on video library size)
  • GPU: Intel/AMD with VAAPI support (optional but recommended)

Recommended:

  • CPU: 8+ cores
  • RAM: 8GB+
  • Storage: 500GB+ SSD
  • GPU: Modern Intel/AMD GPU with VAAPI

Software Requirements

  • OS: Linux (Ubuntu 20.04+, Debian 11+, RHEL 8+, or compatible)
  • Python: 3.7+
  • FFmpeg: With VAAPI support
  • whisper.cpp: Compiled and in PATH
  • Network: Stable connection to NocoDB and RTMP server

Deployment Options

Direct hardware access (best VAAPI performance) Low overhead System integration Manual dependency management

Isolated environment Easy updates Portable configuration ⚠️ Slight performance overhead ⚠️ Requires GPU passthrough for VAAPI

Option 3: Kubernetes/Orchestration

High availability Auto-scaling Cloud-native Complex setup Overkill for single-instance deployment


Installation Methods

Method 1: Systemd Service Installation

1. Create Scheduler User

# Create dedicated user for security
sudo useradd -r -s /bin/bash -d /opt/scheduler -m scheduler

# Add to video group for GPU access
sudo usermod -aG video,render scheduler

2. Install Dependencies

# Install system packages
sudo apt-get update
sudo apt-get install -y python3 python3-pip python3-venv ffmpeg git build-essential

# Install whisper.cpp
sudo -u scheduler git clone https://github.com/ggerganov/whisper.cpp.git /tmp/whisper.cpp
cd /tmp/whisper.cpp
make
sudo cp main /usr/local/bin/whisper.cpp
sudo chmod +x /usr/local/bin/whisper.cpp

# Download whisper model
sudo mkdir -p /opt/models
cd /opt/models
sudo wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
sudo mv ggml-base.en.bin ggml-base.bin
sudo chown -R scheduler:scheduler /opt/models

3. Deploy Application

# Create application directory
sudo mkdir -p /opt/scheduler
sudo chown scheduler:scheduler /opt/scheduler

# Copy application files
sudo -u scheduler cp agent.py /opt/scheduler/
sudo -u scheduler cp requirements.txt /opt/scheduler/

# Create Python virtual environment
sudo -u scheduler python3 -m venv /opt/scheduler/venv
sudo -u scheduler /opt/scheduler/venv/bin/pip install -r /opt/scheduler/requirements.txt

4. Configure Storage

# Create storage directories (adjust paths as needed)
sudo mkdir -p /mnt/storage/raw_movies
sudo mkdir -p /mnt/storage/final_movies
sudo chown -R scheduler:scheduler /mnt/storage

5. Configure Service

# Copy service file
sudo cp scheduler.service /etc/systemd/system/

# Create environment file with secrets
sudo mkdir -p /etc/scheduler
sudo nano /etc/scheduler/scheduler.env

Edit /etc/scheduler/scheduler.env:

NOCODB_URL=https://your-nocodb.com/api/v2/tables/YOUR_TABLE_ID/records
NOCODB_TOKEN=your_production_token
RTMP_SERVER=rtmp://your-rtmp-server.com/live/stream
RAW_DIR=/mnt/storage/raw_movies
FINAL_DIR=/mnt/storage/final_movies
WHISPER_MODEL=/opt/models/ggml-base.bin

Update scheduler.service to use the environment file:

# Replace Environment= lines with:
EnvironmentFile=/etc/scheduler/scheduler.env

6. Enable and Start Service

# Reload systemd
sudo systemctl daemon-reload

# Enable service (start on boot)
sudo systemctl enable scheduler

# Start service
sudo systemctl start scheduler

# Check status
sudo systemctl status scheduler

# View logs
sudo journalctl -u scheduler -f

Method 2: Docker Deployment

1. Prepare Environment

# Create project directory
mkdir -p /opt/scheduler
cd /opt/scheduler

# Copy application files
cp agent.py requirements.txt Dockerfile docker-compose.prod.yml ./

# Create production environment file
cp .env.production.example .env.production
nano .env.production  # Fill in your values

2. Configure Storage

# Ensure storage directories exist
mkdir -p /mnt/storage/raw_movies
mkdir -p /mnt/storage/final_movies

# Download whisper model
mkdir -p /opt/models
cd /opt/models
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
mv ggml-base.en.bin ggml-base.bin

3. Deploy Container

cd /opt/scheduler

# Build image
docker compose -f docker-compose.prod.yml build

# Start service
docker compose -f docker-compose.prod.yml up -d

# Check logs
docker compose -f docker-compose.prod.yml logs -f

# Check status
docker compose -f docker-compose.prod.yml ps

4. Enable Auto-Start

# Create systemd service for docker compose
sudo nano /etc/systemd/system/scheduler-docker.service
[Unit]
Description=Movie Scheduler (Docker)
Requires=docker.service
After=docker.service

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/scheduler
ExecStart=/usr/bin/docker compose -f docker-compose.prod.yml up -d
ExecStop=/usr/bin/docker compose -f docker-compose.prod.yml down
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable scheduler-docker

Configuration

Essential Configuration

NocoDB Connection

# Get your table ID from NocoDB URL
# https://nocodb.com/nc/YOUR_BASE_ID/table_NAME
# API endpoint: https://nocodb.com/api/v2/tables/TABLE_ID/records

# Generate API token in NocoDB:
# Account Settings → Tokens → Create Token

RTMP Server

# For nginx-rtmp:
RTMP_SERVER=rtmp://your-server.com:1935/live/stream

# For other RTMP servers, use their endpoint format

Storage Paths

# Use separate fast storage for final videos (streaming)
RAW_DIR=/mnt/storage/raw_movies      # Can be slower storage
FINAL_DIR=/mnt/fast-storage/final    # Should be fast SSD

# Ensure proper permissions
chown -R scheduler:scheduler /mnt/storage
chmod 755 /mnt/storage/raw_movies
chmod 755 /mnt/fast-storage/final

Performance Tuning

Sync Intervals

# High-load scenario (many jobs, frequent updates)
NOCODB_SYNC_INTERVAL_SECONDS=120  # Check less often
WATCHDOG_CHECK_INTERVAL_SECONDS=15  # Check streams less often

# Low-latency scenario (need fast response)
NOCODB_SYNC_INTERVAL_SECONDS=30
WATCHDOG_CHECK_INTERVAL_SECONDS=5

# Default (balanced)
NOCODB_SYNC_INTERVAL_SECONDS=60
WATCHDOG_CHECK_INTERVAL_SECONDS=10

FFmpeg VAAPI

# Find your VAAPI device
ls -la /dev/dri/

# Common devices:
# renderD128 - Primary GPU
# renderD129 - Secondary GPU

# Test VAAPI
ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i test.mp4 -f null -

# Set in config
VAAPI_DEVICE=/dev/dri/renderD128

Security

Secrets Management

DO NOT hardcode secrets in files tracked by git!

Option 1: Environment Files (Simple)

# Store secrets in protected file
sudo nano /etc/scheduler/scheduler.env
sudo chmod 600 /etc/scheduler/scheduler.env
sudo chown scheduler:scheduler /etc/scheduler/scheduler.env

Option 2: Secrets Management Tools

# Using Vault
export NOCODB_TOKEN=$(vault kv get -field=token secret/scheduler/nocodb)

# Using AWS Secrets Manager
export NOCODB_TOKEN=$(aws secretsmanager get-secret-value --secret-id scheduler/nocodb --query SecretString --output text)

# Using Docker Secrets (Swarm/Kubernetes)
# Mount secrets as files and read in application

Filesystem Permissions

# Application directory
chown -R scheduler:scheduler /opt/scheduler
chmod 750 /opt/scheduler

# Storage directories
chown -R scheduler:scheduler /mnt/storage
chmod 755 /mnt/storage/raw_movies      # Read-only for scheduler
chmod 755 /mnt/storage/final_movies    # Read-write for scheduler

# Database file
chmod 600 /opt/scheduler/scheduler.db
chown scheduler:scheduler /opt/scheduler/scheduler.db

Network Security

# Firewall rules (if scheduler runs on separate server)
# Only allow outbound connections to NocoDB and RTMP

sudo ufw allow out to YOUR_NOCODB_IP port 443  # HTTPS
sudo ufw allow out to YOUR_RTMP_IP port 1935   # RTMP
sudo ufw default deny outgoing  # Deny all other outbound (optional)

Regular Updates

# Update system packages weekly
sudo apt-get update && sudo apt-get upgrade

# Update Python dependencies
sudo -u scheduler /opt/scheduler/venv/bin/pip install --upgrade requests

# Rebuild whisper.cpp quarterly (for performance improvements)

Monitoring

Service Health

Systemd Monitoring

# Check service status
systemctl status scheduler

# View recent logs
journalctl -u scheduler -n 100

# Follow logs in real-time
journalctl -u scheduler -f

# Check for errors in last hour
journalctl -u scheduler --since "1 hour ago" | grep ERROR

# Service restart count
systemctl show scheduler | grep NRestarts

Docker Monitoring

# Container status
docker compose -f docker-compose.prod.yml ps

# Resource usage
docker stats movie_scheduler

# Logs
docker compose -f docker-compose.prod.yml logs --tail=100 -f

# Health check status
docker inspect movie_scheduler | jq '.[0].State.Health'

Database Monitoring

# Check job status
sqlite3 /opt/scheduler/scheduler.db "SELECT prep_status, play_status, COUNT(*) FROM jobs GROUP BY prep_status, play_status;"

# Active streams
sqlite3 /opt/scheduler/scheduler.db "SELECT nocodb_id, title, play_status, stream_retry_count FROM jobs WHERE play_status='streaming';"

# Failed jobs
sqlite3 /opt/scheduler/scheduler.db "SELECT nocodb_id, title, prep_status, play_status, log FROM jobs WHERE prep_status='failed' OR play_status='failed';"

# Database size
ls -lh /opt/scheduler/scheduler.db

Automated Monitoring Script

Create /opt/scheduler/monitor.sh:

#!/bin/bash

LOG_FILE="/var/log/scheduler-monitor.log"
DB_PATH="/opt/scheduler/scheduler.db"

echo "=== Scheduler Monitor - $(date) ===" >> "$LOG_FILE"

# Check if service is running
if systemctl is-active --quiet scheduler; then
    echo "✓ Service is running" >> "$LOG_FILE"
else
    echo "✗ Service is DOWN" >> "$LOG_FILE"
    # Send alert (email, Slack, etc.)
    systemctl start scheduler
fi

# Check for failed jobs
FAILED=$(sqlite3 "$DB_PATH" "SELECT COUNT(*) FROM jobs WHERE prep_status='failed' OR play_status='failed';")
if [ "$FAILED" -gt 0 ]; then
    echo "⚠ Found $FAILED failed jobs" >> "$LOG_FILE"
    # Send alert
fi

# Check active streams
STREAMING=$(sqlite3 "$DB_PATH" "SELECT COUNT(*) FROM jobs WHERE play_status='streaming';")
echo "Active streams: $STREAMING" >> "$LOG_FILE"

# Check disk space
DISK_USAGE=$(df -h /mnt/storage | tail -1 | awk '{print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -gt 90 ]; then
    echo "⚠ Disk usage is ${DISK_USAGE}%" >> "$LOG_FILE"
    # Send alert
fi

echo "" >> "$LOG_FILE"
# Make executable
chmod +x /opt/scheduler/monitor.sh

# Add to crontab (check every 5 minutes)
(crontab -l 2>/dev/null; echo "*/5 * * * * /opt/scheduler/monitor.sh") | crontab -

External Monitoring

Prometheus + Grafana

Export metrics using node_exporter or custom exporter:

# Install node_exporter for system metrics
# Create custom exporter for job metrics from database
# Set up Grafana dashboard

Uptime Monitoring

Use services like:

  • UptimeRobot
  • Pingdom
  • Datadog

Monitor:

  • Service availability
  • RTMP server connectivity
  • NocoDB API accessibility

Backup & Recovery

What to Backup

  1. Database (scheduler.db) - Critical
  2. Configuration (.env.production or /etc/scheduler/scheduler.env) - Critical
  3. Final videos (if you want to keep processed videos)
  4. Logs (optional, for forensics)

Backup Script

Create /opt/scheduler/backup.sh:

#!/bin/bash

BACKUP_DIR="/backup/scheduler"
DATE=$(date +%Y%m%d_%H%M%S)
DB_PATH="/opt/scheduler/scheduler.db"
CONFIG_PATH="/etc/scheduler/scheduler.env"

# Create backup directory
mkdir -p "$BACKUP_DIR"

# Backup database
cp "$DB_PATH" "$BACKUP_DIR/scheduler_${DATE}.db"

# Backup config (careful with secrets!)
cp "$CONFIG_PATH" "$BACKUP_DIR/config_${DATE}.env"

# Compress old backups
find "$BACKUP_DIR" -name "*.db" -mtime +7 -exec gzip {} \;

# Delete backups older than 30 days
find "$BACKUP_DIR" -name "*.db.gz" -mtime +30 -delete
find "$BACKUP_DIR" -name "*.env" -mtime +30 -delete

# Optional: Upload to S3/cloud storage
# aws s3 sync "$BACKUP_DIR" s3://your-bucket/scheduler-backups/

echo "Backup completed: $BACKUP_DIR/scheduler_${DATE}.db"
# Make executable
chmod +x /opt/scheduler/backup.sh

# Run daily at 2 AM
(crontab -l 2>/dev/null; echo "0 2 * * * /opt/scheduler/backup.sh") | crontab -

Recovery Procedure

1. Restore from Backup

# Stop service
sudo systemctl stop scheduler

# Restore database
cp /backup/scheduler/scheduler_YYYYMMDD_HHMMSS.db /opt/scheduler/scheduler.db
chown scheduler:scheduler /opt/scheduler/scheduler.db

# Restore config if needed
cp /backup/scheduler/config_YYYYMMDD_HHMMSS.env /etc/scheduler/scheduler.env

# Start service
sudo systemctl start scheduler

2. Disaster Recovery (Full Rebuild)

If server is completely lost:

  1. Provision new server
  2. Follow installation steps above
  3. Restore database and config from backup
  4. Restart service
  5. Verify jobs are picked up

Recovery Time Objective (RTO): 30-60 minutes Recovery Point Objective (RPO): Up to 24 hours (with daily backups)


Maintenance

Routine Tasks

Daily

  • ✓ Check service status
  • ✓ Review error logs
  • ✓ Check failed jobs

Weekly

  • ✓ Review disk space
  • ✓ Check database size
  • ✓ Clean up old processed videos (if not needed)

Monthly

  • ✓ Update system packages
  • ✓ Review and optimize database
  • ✓ Test backup restoration
  • ✓ Review and rotate logs

Database Maintenance

# Vacuum database (reclaim space, optimize)
sqlite3 /opt/scheduler/scheduler.db "VACUUM;"

# Analyze database (update statistics)
sqlite3 /opt/scheduler/scheduler.db "ANALYZE;"

# Clean up old completed jobs (optional)
sqlite3 /opt/scheduler/scheduler.db "DELETE FROM jobs WHERE play_status='done' AND datetime(run_at) < datetime('now', '-30 days');"

Log Rotation

For systemd (automatic via journald):

# Configure in /etc/systemd/journald.conf
SystemMaxUse=1G
RuntimeMaxUse=100M

For Docker:

# Already configured in docker-compose.prod.yml
logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "5"

Video Cleanup

# Clean up old final videos (adjust retention as needed)
find /mnt/storage/final_movies -name "*.mp4" -mtime +7 -delete

# Or move to archive
find /mnt/storage/final_movies -name "*.mp4" -mtime +7 -exec mv {} /mnt/archive/ \;

Troubleshooting

Service Won't Start

# Check service status
systemctl status scheduler

# Check logs for errors
journalctl -u scheduler -n 50

# Common issues:
# 1. Missing environment variables
grep -i "error" /var/log/syslog | grep scheduler

# 2. Permission issues
ls -la /opt/scheduler
ls -la /mnt/storage

# 3. GPU access issues
ls -la /dev/dri/
groups scheduler  # Should include 'video' and 'render'

Streams Keep Failing

# Test RTMP server manually
ffmpeg -re -i test.mp4 -c copy -f flv rtmp://your-server/live/stream

# Check network connectivity
ping your-rtmp-server.com
telnet your-rtmp-server.com 1935

# Review stream logs
journalctl -u scheduler | grep -A 10 "Stream crashed"

# Check retry count
sqlite3 /opt/scheduler/scheduler.db "SELECT nocodb_id, title, stream_retry_count FROM jobs WHERE play_status='streaming';"

High CPU/Memory Usage

# Check resource usage
top -u scheduler

# Or for Docker
docker stats movie_scheduler

# Common causes:
# 1. Large video file encoding - normal, wait for completion
# 2. whisper.cpp using all cores - normal
# 3. Multiple prep jobs running - adjust or wait

# Limit resources if needed (systemd)
systemctl edit scheduler
# Add:
[Service]
CPUQuota=200%
MemoryMax=4G

Database Locked Errors

# Check for stale locks
lsof /opt/scheduler/scheduler.db

# Kill stale processes if needed
# Restart service
systemctl restart scheduler

VAAPI Not Working

# Verify VAAPI support
vainfo

# Test FFmpeg VAAPI
ffmpeg -hwaccels

# Check permissions
ls -la /dev/dri/renderD128
groups scheduler  # Should include 'video' or 'render'

# Fallback to software encoding
# Comment out VAAPI_DEVICE in config
# Encoding will use CPU (slower but works)

Performance Optimization

Hardware Acceleration

# Verify GPU usage during encoding
intel_gpu_top  # For Intel GPUs
radeontop      # For AMD GPUs

# If GPU not being used, check:
# 1. VAAPI device path correct
# 2. User has GPU permissions
# 3. FFmpeg compiled with VAAPI support

Storage Performance

# Use SSD for final videos (they're streamed frequently)
# Use HDD for raw videos (accessed once for processing)

# Test disk performance
dd if=/dev/zero of=/mnt/storage/test bs=1M count=1024 oflag=direct
rm /mnt/storage/test

Network Optimization

# For better streaming reliability
# 1. Use dedicated network for RTMP
# 2. Enable QoS for streaming traffic
# 3. Consider local RTMP relay

# Test network throughput
iperf3 -c your-rtmp-server.com

Production Checklist

Before going live:

  • Secrets stored securely (not in git)
  • Service auto-starts on boot
  • Backups configured and tested
  • Monitoring configured
  • Logs being rotated
  • Disk space alerts configured
  • Test recovery procedure
  • Document runbook for on-call
  • GPU permissions verified
  • RTMP connectivity tested
  • NocoDB API tested
  • Process one test video end-to-end
  • Verify streaming watchdog works
  • Test service restart during streaming
  • Configure alerting for failures

Support & Updates

Getting Updates

# Git-based deployment
cd /opt/scheduler
git pull origin main
systemctl restart scheduler

# Docker-based deployment
cd /opt/scheduler
docker compose -f docker-compose.prod.yml pull
docker compose -f docker-compose.prod.yml up -d

Reporting Issues

Include in bug reports:

  • Service logs: journalctl -u scheduler -n 100
  • Database state: sqlite3 scheduler.db ".dump jobs"
  • System info: uname -a, python3 --version, ffmpeg -version
  • Configuration (redact secrets!)

Additional Resources


License

This production guide is provided as-is. Test thoroughly in staging before production deployment.