DevOps Engineer Guide
Comprehensive guide for DevOps engineers managing the MyTV API infrastructure.
Infrastructure Overview
┌─────────────────────────────────────────────────────────────────┐
│ Load Balancer │
│ (Nginx / Cloud LB) │
└─────────────────────────────────────────────────────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ API Pod │ │ API Pod │ │ API Pod │
│ (NestJS) │ │ (NestJS) │ │ (NestJS) │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└──────────────────┼──────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│PostgreSQL │ │ Redis │ │ R2 │
│ (Primary)│ │ (Cluster) │ │ (Media) │
└───────────┘ └───────────┘ └───────────┘
Quick Start
Local Development Stack
# Clone repository
git clone https://github.com/mytv/mytelevision-api.git
cd mytelevision-api
# Start development environment
docker-compose up -d
# Start monitoring stack
docker-compose -f docker-compose.yml -f docker-compose.monitoring.yml up -d
Verify Services
# Check containers
docker ps
# Expected services:
# - mytv-api (port 3000)
# - mytv-postgres (port 5432)
# - mytv-redis (port 6379)
# - prometheus (port 9090)
# - grafana (port 3001)
# - alertmanager (port 9093)
Docker Configuration
Production Dockerfile
# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production=false
COPY . .
RUN npm run build
# Production stage
FROM node:20-alpine AS production
WORKDIR /app
ENV NODE_ENV=production
# Security: non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nestjs -u 1001
COPY /app/dist ./dist
COPY /app/node_modules ./node_modules
COPY /app/package.json ./
USER nestjs
EXPOSE 3000
CMD ["node", "dist/main.js"]
Docker Compose Files
| File | Purpose |
|---|---|
docker-compose.yml | Base services (PostgreSQL, Redis) |
docker-compose.monitoring.yml | Monitoring stack |
docker-compose.production.yml | Production overrides with secrets |
docker-compose.override.yml | Local development overrides |
Using Docker Secrets (Production)
# docker-compose.production.yml
services:
api:
secrets:
- db_password
- jwt_secret
- redis_password
environment:
DATABASE_URL_FILE: /run/secrets/db_url
secrets:
db_password:
external: true
jwt_secret:
external: true
redis_password:
external: true
Kubernetes Deployment
Namespace Setup
kubectl create namespace mytv-production
kubectl create namespace mytv-staging
Deployment Manifest
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mytv-api
namespace: mytv-production
spec:
replicas: 3
selector:
matchLabels:
app: mytv-api
template:
metadata:
labels:
app: mytv-api
spec:
containers:
- name: api
image: ghcr.io/mytv/mytelevision-api:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: production
envFrom:
- secretRef:
name: mytv-api-secrets
resources:
requests:
memory: '256Mi'
cpu: '200m'
limits:
memory: '512Mi'
cpu: '500m'
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
Service & Ingress
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: mytv-api
namespace: mytv-production
spec:
selector:
app: mytv-api
ports:
- port: 80
targetPort: 3000
type: ClusterIP
---
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: mytv-api
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- api.mytelevision.app
secretName: mytv-api-tls
rules:
- host: api.mytelevision.app
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: mytv-api
port:
number: 80
Horizontal Pod Autoscaler
# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mytv-api
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mytv-api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
CI/CD Pipeline
GitHub Actions
# .github/workflows/ci.yml
name: CI/CD Pipeline
on:
push:
branches: [develop, main]
pull_request:
branches: [develop]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20' }
- run: npm ci
- run: npm run lint:check
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16-alpine
env: { POSTGRES_PASSWORD: test }
redis:
image: redis:7-alpine
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npm run test:cov
build:
needs: [lint, test]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npm run build
deploy-staging:
needs: build
if: github.ref == 'refs/heads/develop'
runs-on: ubuntu-latest
steps:
- name: Deploy to staging
run: echo "Deploy to staging environment"
deploy-production:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Deploy to production
run: echo "Deploy to production environment"
Monitoring
Stack Components
| Component | Port | Purpose |
|---|---|---|
| Prometheus | 9090 | Metrics collection |
| Grafana | 3001 | Dashboards (admin/admin) |
| Alertmanager | 9093 | Alert routing |
| Node Exporter | 9100 | Host metrics |
| PG Exporter | 9187 | PostgreSQL metrics |
| Redis Exporter | 9121 | Redis metrics |
| cAdvisor | 8080 | Container metrics |
Prometheus Configuration
# prometheus/prometheus.yml
scrape_configs:
- job_name: 'mytv-api'
scrape_interval: 15s
static_configs:
- targets: ['api:3000']
metrics_path: '/metrics'
Alert Rules
# prometheus/alerts.yml
groups:
- name: mytv-api
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: critical
- alert: HighLatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 0.5
for: 5m
labels:
severity: warning
Grafana Dashboards
| Dashboard | UID | Metrics |
|---|---|---|
| API Overview | api-overview | Requests, latency, errors |
| Auth & Sessions | auth-sessions | Logins, sessions |
| Database | database-perf | Connections, queries |
| Business Metrics | business-metrics | Users, engagement |
| Infrastructure | infrastructure | CPU, memory, disk |
Database Management
Backup Script
#!/bin/bash
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups"
pg_dump -U mytelevision mytelevision | gzip > $BACKUP_DIR/backup_$DATE.sql.gz
# Upload to S3/R2
aws s3 cp $BACKUP_DIR/backup_$DATE.sql.gz s3://backups/
Restore
# Download backup
aws s3 cp s3://backups/backup_20241211.sql.gz .
gunzip backup_20241211.sql.gz
# Stop API, restore, restart
docker-compose stop api
psql -U mytelevision -d mytelevision < backup_20241211.sql
docker-compose start api
Migration Management
# Apply migrations (production)
npx prisma migrate deploy
# Check migration status
npx prisma migrate status
Log Management
Structured Logging
The API outputs JSON-structured logs:
{
"level": "info",
"timestamp": "2025-01-15T10:30:00Z",
"method": "GET",
"path": "/api/v2/movies",
"statusCode": 200,
"duration": 45,
"userId": "uuid"
}
Log Aggregation (Loki)
NestJS API --> Promtail --> Loki --> Grafana
Security Checklist
- HTTPS/TLS enabled
- Docker secrets for credentials
- Non-root container user
- Network policies configured
- Rate limiting enabled
- CORS configured
- Helmet.js security headers
- Regular dependency updates (Dependabot)
- Secret rotation schedule
- Backup verification tested
Troubleshooting
CrashLoopBackOff
kubectl logs <pod-name> --previous
# Usually: missing env vars, DB connection failure
High Memory Usage
# Check container stats
docker stats
# Node.js memory options
NODE_OPTIONS="--max-old-space-size=512"
Connection Pool Issues
# Increase pool size
DATABASE_URL="...?connection_limit=20&pool_timeout=10"
Useful Commands
# kubectl shortcuts
kubectl get pods -n mytv-production
kubectl logs -f deployment/mytv-api -n mytv-production
kubectl exec -it <pod> -- sh
kubectl rollout restart deployment/mytv-api -n mytv-production
kubectl rollout undo deployment/mytv-api -n mytv-production
# Docker shortcuts
docker-compose ps
docker-compose logs -f api
docker stats
docker system prune -af