DevOps Engineer Guide

Comprehensive guide for DevOps engineers managing the MyTV API infrastructure.

Infrastructure Overview

┌─────────────────────────────────────────────────────────────────┐
│                        Load Balancer                             │
│                    (Nginx / Cloud LB)                           │
└─────────────────────────────────────────────────────────────────┘
                              │
           ┌──────────────────┼──────────────────┐
           │                  │                  │
     ┌─────▼─────┐      ┌─────▼─────┐      ┌─────▼─────┐
     │  API Pod  │      │  API Pod  │      │  API Pod  │
     │  (NestJS) │      │  (NestJS) │      │  (NestJS) │
     └─────┬─────┘      └─────┬─────┘      └─────┬─────┘
           │                  │                  │
           └──────────────────┼──────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
  ┌─────▼─────┐        ┌─────▼─────┐        ┌─────▼─────┐
  │PostgreSQL │        │   Redis   │        │    R2     │
  │  (Primary)│        │ (Cluster) │        │  (Media)  │
  └───────────┘        └───────────┘        └───────────┘

Quick Start

Local Development Stack

# Clone repository
git clone https://github.com/mytv/mytelevision-api.git
cd mytelevision-api

# Start development environment
docker-compose up -d

# Start monitoring stack
docker-compose -f docker-compose.yml -f docker-compose.monitoring.yml up -d

Verify Services

# Check containers
docker ps

# Expected services:
# - mytv-api (port 3000)
# - mytv-postgres (port 5432)
# - mytv-redis (port 6379)
# - prometheus (port 9090)
# - grafana (port 3001)
# - alertmanager (port 9093)

Docker Configuration

Production Dockerfile

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production=false
COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine AS production
WORKDIR /app
ENV NODE_ENV=production

# Security: non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nestjs -u 1001

COPY --from=builder --chown=nestjs:nodejs /app/dist ./dist
COPY --from=builder --chown=nestjs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nestjs:nodejs /app/package.json ./

USER nestjs
EXPOSE 3000
CMD ["node", "dist/main.js"]

Docker Compose Files

File	Purpose
`docker-compose.yml`	Base services (PostgreSQL, Redis)
`docker-compose.monitoring.yml`	Monitoring stack
`docker-compose.production.yml`	Production overrides with secrets
`docker-compose.override.yml`	Local development overrides

Using Docker Secrets (Production)

# docker-compose.production.yml
services:
  api:
    secrets:
      - db_password
      - jwt_secret
      - redis_password
    environment:
      DATABASE_URL_FILE: /run/secrets/db_url

secrets:
  db_password:
    external: true
  jwt_secret:
    external: true
  redis_password:
    external: true

Kubernetes Deployment

Namespace Setup

kubectl create namespace mytv-production
kubectl create namespace mytv-staging

Deployment Manifest

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mytv-api
  namespace: mytv-production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: mytv-api
  template:
    metadata:
      labels:
        app: mytv-api
    spec:
      containers:
        - name: api
          image: ghcr.io/mytv/mytelevision-api:latest
          ports:
            - containerPort: 3000
          env:
            - name: NODE_ENV
              value: production
          envFrom:
            - secretRef:
                name: mytv-api-secrets
          resources:
            requests:
              memory: '256Mi'
              cpu: '200m'
            limits:
              memory: '512Mi'
              cpu: '500m'
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5

Service & Ingress

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: mytv-api
  namespace: mytv-production
spec:
  selector:
    app: mytv-api
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP
---
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mytv-api
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts:
        - api.mytelevision.app
      secretName: mytv-api-tls
  rules:
    - host: api.mytelevision.app
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: mytv-api
                port:
                  number: 80

Horizontal Pod Autoscaler

# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mytv-api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mytv-api
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

CI/CD Pipeline

GitHub Actions

# .github/workflows/ci.yml
name: CI/CD Pipeline
on:
  push:
    branches: [develop, main]
  pull_request:
    branches: [develop]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20' }
      - run: npm ci
      - run: npm run lint:check

  test:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16-alpine
        env: { POSTGRES_PASSWORD: test }
      redis:
        image: redis:7-alpine
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm run test:cov

  build:
    needs: [lint, test]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm run build

  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/develop'
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to staging
        run: echo "Deploy to staging environment"

  deploy-production:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to production
        run: echo "Deploy to production environment"

Monitoring

Stack Components

Component	Port	Purpose
Prometheus	9090	Metrics collection
Grafana	3001	Dashboards (admin/admin)
Alertmanager	9093	Alert routing
Node Exporter	9100	Host metrics
PG Exporter	9187	PostgreSQL metrics
Redis Exporter	9121	Redis metrics
cAdvisor	8080	Container metrics

Prometheus Configuration

# prometheus/prometheus.yml
scrape_configs:
  - job_name: 'mytv-api'
    scrape_interval: 15s
    static_configs:
      - targets: ['api:3000']
    metrics_path: '/metrics'

Alert Rules

# prometheus/alerts.yml
groups:
  - name: mytv-api
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
      - alert: HighLatency
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 0.5
        for: 5m
        labels:
          severity: warning

Grafana Dashboards

Dashboard	UID	Metrics
API Overview	`api-overview`	Requests, latency, errors
Auth & Sessions	`auth-sessions`	Logins, sessions
Database	`database-perf`	Connections, queries
Business Metrics	`business-metrics`	Users, engagement
Infrastructure	`infrastructure`	CPU, memory, disk

Database Management

Backup Script

#!/bin/bash
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups"

pg_dump -U mytelevision mytelevision | gzip > $BACKUP_DIR/backup_$DATE.sql.gz

# Upload to S3/R2
aws s3 cp $BACKUP_DIR/backup_$DATE.sql.gz s3://backups/

Restore

# Download backup
aws s3 cp s3://backups/backup_20241211.sql.gz .
gunzip backup_20241211.sql.gz

# Stop API, restore, restart
docker-compose stop api
psql -U mytelevision -d mytelevision < backup_20241211.sql
docker-compose start api

Migration Management

# Apply migrations (production)
npx prisma migrate deploy

# Check migration status
npx prisma migrate status

Log Management

Structured Logging

The API outputs JSON-structured logs:

{
  "level": "info",
  "timestamp": "2025-01-15T10:30:00Z",
  "method": "GET",
  "path": "/api/v2/movies",
  "statusCode": 200,
  "duration": 45,
  "userId": "uuid"
}

Log Aggregation (Loki)

NestJS API --> Promtail --> Loki --> Grafana

Security Checklist

Troubleshooting

CrashLoopBackOff

kubectl logs <pod-name> --previous
# Usually: missing env vars, DB connection failure

High Memory Usage

# Check container stats
docker stats

# Node.js memory options
NODE_OPTIONS="--max-old-space-size=512"

Connection Pool Issues

# Increase pool size
DATABASE_URL="...?connection_limit=20&pool_timeout=10"

Useful Commands

# kubectl shortcuts
kubectl get pods -n mytv-production
kubectl logs -f deployment/mytv-api -n mytv-production
kubectl exec -it <pod> -- sh
kubectl rollout restart deployment/mytv-api -n mytv-production
kubectl rollout undo deployment/mytv-api -n mytv-production

# Docker shortcuts
docker-compose ps
docker-compose logs -f api
docker stats
docker system prune -af

Infrastructure Overview​

Quick Start​

Local Development Stack​

Verify Services​

Docker Configuration​

Production Dockerfile​

Docker Compose Files​

Using Docker Secrets (Production)​

Kubernetes Deployment​

Namespace Setup​

Deployment Manifest​

Service & Ingress​

Horizontal Pod Autoscaler​

CI/CD Pipeline​

GitHub Actions​

Monitoring​

Stack Components​

Prometheus Configuration​

Alert Rules​

Grafana Dashboards​

Database Management​

Backup Script​

Restore​

Migration Management​

Log Management​

Structured Logging​

Log Aggregation (Loki)​

Security Checklist​

Troubleshooting​

CrashLoopBackOff​

High Memory Usage​

Connection Pool Issues​

Useful Commands​

Infrastructure Overview

Quick Start

Local Development Stack

Verify Services

Docker Configuration

Production Dockerfile

Docker Compose Files

Using Docker Secrets (Production)

Kubernetes Deployment

Namespace Setup

Deployment Manifest

Service & Ingress

Horizontal Pod Autoscaler

CI/CD Pipeline

GitHub Actions

Monitoring

Stack Components

Prometheus Configuration

Alert Rules

Grafana Dashboards

Database Management

Backup Script

Restore

Migration Management

Log Management

Structured Logging

Log Aggregation (Loki)

Security Checklist

Troubleshooting

CrashLoopBackOff

High Memory Usage

Connection Pool Issues

Useful Commands