Aller au contenu principal

DevOps Engineer Guide

Comprehensive guide for DevOps engineers managing the MyTV API infrastructure.

Infrastructure Overview

┌─────────────────────────────────────────────────────────────────┐
│ Load Balancer │
│ (Nginx / Cloud LB) │
└─────────────────────────────────────────────────────────────────┘

┌──────────────────┼──────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ API Pod │ │ API Pod │ │ API Pod │
│ (NestJS) │ │ (NestJS) │ │ (NestJS) │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└──────────────────┼──────────────────┘

┌─────────────────────┼─────────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│PostgreSQL │ │ Redis │ │ R2 │
│ (Primary)│ │ (Cluster) │ │ (Media) │
└───────────┘ └───────────┘ └───────────┘

Quick Start

Local Development Stack

# Clone repository
git clone https://github.com/mytv/mytelevision-api.git
cd mytelevision-api

# Start development environment
docker-compose up -d

# Start monitoring stack
docker-compose -f docker-compose.yml -f docker-compose.monitoring.yml up -d

Verify Services

# Check containers
docker ps

# Expected services:
# - mytv-api (port 3000)
# - mytv-postgres (port 5432)
# - mytv-redis (port 6379)
# - prometheus (port 9090)
# - grafana (port 3001)
# - alertmanager (port 9093)

Docker Configuration

Production Dockerfile

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production=false
COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine AS production
WORKDIR /app
ENV NODE_ENV=production

# Security: non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nestjs -u 1001

COPY --from=builder --chown=nestjs:nodejs /app/dist ./dist
COPY --from=builder --chown=nestjs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nestjs:nodejs /app/package.json ./

USER nestjs
EXPOSE 3000
CMD ["node", "dist/main.js"]

Docker Compose Files

FilePurpose
docker-compose.ymlBase services (PostgreSQL, Redis)
docker-compose.monitoring.ymlMonitoring stack
docker-compose.production.ymlProduction overrides with secrets
docker-compose.override.ymlLocal development overrides

Using Docker Secrets (Production)

# docker-compose.production.yml
services:
api:
secrets:
- db_password
- jwt_secret
- redis_password
environment:
DATABASE_URL_FILE: /run/secrets/db_url

secrets:
db_password:
external: true
jwt_secret:
external: true
redis_password:
external: true

Kubernetes Deployment

Namespace Setup

kubectl create namespace mytv-production
kubectl create namespace mytv-staging

Deployment Manifest

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mytv-api
namespace: mytv-production
spec:
replicas: 3
selector:
matchLabels:
app: mytv-api
template:
metadata:
labels:
app: mytv-api
spec:
containers:
- name: api
image: ghcr.io/mytv/mytelevision-api:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: production
envFrom:
- secretRef:
name: mytv-api-secrets
resources:
requests:
memory: '256Mi'
cpu: '200m'
limits:
memory: '512Mi'
cpu: '500m'
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5

Service & Ingress

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: mytv-api
namespace: mytv-production
spec:
selector:
app: mytv-api
ports:
- port: 80
targetPort: 3000
type: ClusterIP
---
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: mytv-api
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- api.mytelevision.app
secretName: mytv-api-tls
rules:
- host: api.mytelevision.app
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: mytv-api
port:
number: 80

Horizontal Pod Autoscaler

# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mytv-api
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mytv-api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

CI/CD Pipeline

GitHub Actions

# .github/workflows/ci.yml
name: CI/CD Pipeline
on:
push:
branches: [develop, main]
pull_request:
branches: [develop]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: '20' }
- run: npm ci
- run: npm run lint:check

test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16-alpine
env: { POSTGRES_PASSWORD: test }
redis:
image: redis:7-alpine
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npm run test:cov

build:
needs: [lint, test]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npm run build

deploy-staging:
needs: build
if: github.ref == 'refs/heads/develop'
runs-on: ubuntu-latest
steps:
- name: Deploy to staging
run: echo "Deploy to staging environment"

deploy-production:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Deploy to production
run: echo "Deploy to production environment"

Monitoring

Stack Components

ComponentPortPurpose
Prometheus9090Metrics collection
Grafana3001Dashboards (admin/admin)
Alertmanager9093Alert routing
Node Exporter9100Host metrics
PG Exporter9187PostgreSQL metrics
Redis Exporter9121Redis metrics
cAdvisor8080Container metrics

Prometheus Configuration

# prometheus/prometheus.yml
scrape_configs:
- job_name: 'mytv-api'
scrape_interval: 15s
static_configs:
- targets: ['api:3000']
metrics_path: '/metrics'

Alert Rules

# prometheus/alerts.yml
groups:
- name: mytv-api
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: critical
- alert: HighLatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 0.5
for: 5m
labels:
severity: warning

Grafana Dashboards

DashboardUIDMetrics
API Overviewapi-overviewRequests, latency, errors
Auth & Sessionsauth-sessionsLogins, sessions
Databasedatabase-perfConnections, queries
Business Metricsbusiness-metricsUsers, engagement
InfrastructureinfrastructureCPU, memory, disk

Database Management

Backup Script

#!/bin/bash
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups"

pg_dump -U mytelevision mytelevision | gzip > $BACKUP_DIR/backup_$DATE.sql.gz

# Upload to S3/R2
aws s3 cp $BACKUP_DIR/backup_$DATE.sql.gz s3://backups/

Restore

# Download backup
aws s3 cp s3://backups/backup_20241211.sql.gz .
gunzip backup_20241211.sql.gz

# Stop API, restore, restart
docker-compose stop api
psql -U mytelevision -d mytelevision < backup_20241211.sql
docker-compose start api

Migration Management

# Apply migrations (production)
npx prisma migrate deploy

# Check migration status
npx prisma migrate status

Log Management

Structured Logging

The API outputs JSON-structured logs:

{
"level": "info",
"timestamp": "2025-01-15T10:30:00Z",
"method": "GET",
"path": "/api/v2/movies",
"statusCode": 200,
"duration": 45,
"userId": "uuid"
}

Log Aggregation (Loki)

NestJS API --> Promtail --> Loki --> Grafana

Security Checklist

  • HTTPS/TLS enabled
  • Docker secrets for credentials
  • Non-root container user
  • Network policies configured
  • Rate limiting enabled
  • CORS configured
  • Helmet.js security headers
  • Regular dependency updates (Dependabot)
  • Secret rotation schedule
  • Backup verification tested

Troubleshooting

CrashLoopBackOff

kubectl logs <pod-name> --previous
# Usually: missing env vars, DB connection failure

High Memory Usage

# Check container stats
docker stats

# Node.js memory options
NODE_OPTIONS="--max-old-space-size=512"

Connection Pool Issues

# Increase pool size
DATABASE_URL="...?connection_limit=20&pool_timeout=10"

Useful Commands

# kubectl shortcuts
kubectl get pods -n mytv-production
kubectl logs -f deployment/mytv-api -n mytv-production
kubectl exec -it <pod> -- sh
kubectl rollout restart deployment/mytv-api -n mytv-production
kubectl rollout undo deployment/mytv-api -n mytv-production

# Docker shortcuts
docker-compose ps
docker-compose logs -f api
docker stats
docker system prune -af