Security Runbook - MyTelevision API
Purpose: Guide for responding to security incidents, investigating threats, and maintaining security posture.
Table of Contents
- Incident Response Checklist
- Common Security Scenarios
- Token & Session Management
- Rate Limiting & DDoS
- Database Security
- Secret Management
- Audit & Logging
- Contact & Escalation
Incident Response Checklist
Immediate Actions (First 15 minutes)
- ASSESS: Identify the nature and scope of the incident
- CONTAIN: Isolate affected systems if necessary
- PRESERVE: Collect logs and evidence before any changes
- NOTIFY: Alert the security team and stakeholders
- DOCUMENT: Start incident timeline documentation
Investigation Phase
- Review application logs (
/var/log/mytelevision/or container logs) - Check Redis for suspicious session patterns
- Review database audit logs
- Analyze rate limiting metrics
- Check for unusual API patterns
Recovery Phase
- Implement necessary fixes
- Rotate compromised credentials
- Clear affected caches/sessions
- Verify system integrity
- Update monitoring rules
Common Security Scenarios
1. Suspected Account Compromise
Symptoms:
- Unusual login locations
- Multiple failed login attempts
- Unexpected password changes
- Reports from users
Response:
# 1. Check recent login activity for user
# Query database for user sessions
SELECT * FROM "UserSession"
WHERE "userId" = '<USER_ID>'
ORDER BY "createdAt" DESC
LIMIT 50;
# For multi-tenant system
SELECT * FROM "AccountSession"
WHERE "accountId" = '<ACCOUNT_ID>'
ORDER BY "createdAt" DESC
LIMIT 50;
# 2. Revoke all sessions
DELETE FROM "UserSession" WHERE "userId" = '<USER_ID>';
DELETE FROM "AccountSession" WHERE "accountId" = '<ACCOUNT_ID>';
# 3. Clear Redis session cache
redis-cli KEYS "session:user:<USER_ID>:*" | xargs redis-cli DEL
redis-cli KEYS "account:<ACCOUNT_ID>:*" | xargs redis-cli DEL
# 4. Force password reset (mark in database)
UPDATE "User" SET "forcePasswordChange" = true WHERE id = '<USER_ID>';
2. Brute Force Attack Detection
Symptoms:
- High volume of 401 responses
- Same IP hitting login endpoint
- Rate limiter triggered frequently
Response:
# 1. Identify attacking IPs from logs
grep "POST /api/v2/auth/login" /var/log/nginx/access.log | \
awk '{print $1}' | sort | uniq -c | sort -rn | head -20
# 2. Check rate limit counters in Redis
redis-cli KEYS "throttle:*" | head -50
# 3. Block IP at firewall level (temporary)
# For iptables:
iptables -A INPUT -s <ATTACKER_IP> -j DROP
# For nginx:
# Add to /etc/nginx/conf.d/blocked.conf:
# deny <ATTACKER_IP>;
# 4. Monitor rate limit effectiveness
redis-cli MONITOR | grep throttle
3. JWT Token Theft Suspected
Symptoms:
- Same token used from multiple IPs
- Token used after logout
- Impossible travel patterns
Response:
# 1. Blacklist the compromised token family
# Get token family from JWT claims
redis-cli SADD "blacklist:tokens" "<TOKEN_JTI>"
redis-cli SADD "blacklist:families" "<TOKEN_FAMILY_ID>"
# 2. Revoke all sessions for affected user
# See Account Compromise section above
# 3. If token signing key suspected compromised:
# CRITICAL: Rotate JWT_SECRET
# This will invalidate ALL active tokens
# Update .env and restart all instances
# 4. Check for token reuse
SELECT * FROM "AccountSession"
WHERE "tokenFamily" = '<FAMILY_ID>'
ORDER BY "createdAt" DESC;
4. Data Exfiltration Attempt
Symptoms:
- Abnormally high data transfer
- Bulk API requests
- Scraping patterns
Response:
# 1. Identify high-volume requesters
# Check application logs for bulk requests
grep "GET /api/v2/movies" /var/log/app/*.log | \
cut -d' ' -f1 | sort | uniq -c | sort -rn | head -20
# 2. Implement emergency rate limits
# Update rate limit configuration in environment:
THROTTLE_LIMIT=10 # Reduce from 100
# 3. Add user-agent blocking if scraper
# In nginx:
if ($http_user_agent ~* (scrapy|bot|crawler)) {
return 403;
}
# 4. Enable request logging for investigation
# Set LOG_LEVEL=debug temporarily
Token & Session Management
Token Architecture
| Token Type | Storage | TTL | Hash |
|---|---|---|---|
| Access Token (Legacy) | DB + Redis | 1h | SHA256 |
| Refresh Token (Legacy) | DB | 7d | SHA256 |
| Access Token (Multi-tenant) | Redis | 1h | SHA256 |
| Refresh Token (Multi-tenant) | DB | 7d | SHA256 |
| Stream Token | None (signed) | 4h | HMAC-SHA256 |
Emergency Token Operations
# Blacklist a specific token
redis-cli SADD "token:blacklist" "<TOKEN_JTI>"
redis-cli EXPIRE "token:blacklist" 86400 # 24h
# Clear all sessions for an account
redis-cli KEYS "session:account:<ACCOUNT_ID>:*" | xargs redis-cli DEL
# Force re-authentication for all users (EXTREME)
redis-cli FLUSHDB # WARNING: Clears ALL Redis data
# Rotate JWT secret (invalidates all tokens)
# 1. Generate new secret
openssl rand -base64 64
# 2. Update environment
# JWT_SECRET=<new_secret>
# JWT_REFRESH_SECRET=<new_refresh_secret>
# 3. Restart all instances
docker-compose restart api
Rate Limiting & DDoS
Rate Limit Tiers
| Tier | Limit | Window | Key Pattern |
|---|---|---|---|
| Short | 3 req | 1 sec | throttle:short:<ip> |
| Medium | 20 req | 10 sec | throttle:medium:<ip> |
| Long | 100 req | 60 sec | throttle:long:<ip> |
| Profile | varies | varies | throttle:<tenant>:<profile>:<tier> |
DDoS Response
# 1. Check current rate limit status
redis-cli KEYS "throttle:*" | wc -l
# 2. Identify top offenders
redis-cli --scan --pattern "throttle:*" | \
xargs -I {} redis-cli GET {} | sort -rn | head -20
# 3. Emergency rate limit reduction
# Update environment variables:
THROTTLE_LIMIT=20 # Reduce from 100
THROTTLE_TTL=120000 # Increase window to 2 minutes
# 4. Enable Cloudflare Under Attack mode (if using)
# Via API or dashboard
# 5. Scale up API instances
docker-compose up -d --scale api=5
Database Security
Suspicious Query Detection
-- Find users with excessive failed logins
SELECT u.email, COUNT(*) as failed_attempts
FROM "User" u
JOIN "AuditLog" a ON a."userId" = u.id
WHERE a.action = 'LOGIN_FAILED'
AND a."createdAt" > NOW() - INTERVAL '1 hour'
GROUP BY u.email
HAVING COUNT(*) > 5
ORDER BY failed_attempts DESC;
-- Find accounts with unusual session counts
SELECT a.email, COUNT(s.id) as session_count
FROM "Account" a
JOIN "AccountSession" s ON s."accountId" = a.id
WHERE s.status = 'ACTIVE'
GROUP BY a.email
HAVING COUNT(s.id) > 10
ORDER BY session_count DESC;
-- Find profiles with PIN lockout
SELECT p.*, a.email
FROM "Profile" p
JOIN "Account" a ON a.id = p."accountId"
WHERE p."pinLockedUntil" > NOW();
Emergency Database Operations
# Backup before any changes
pg_dump -h localhost -U mytelevision mytelevision > backup_$(date +%Y%m%d_%H%M%S).sql
# Lock user account
UPDATE "User" SET status = 'LOCKED' WHERE id = '<USER_ID>';
UPDATE "Account" SET status = 'LOCKED' WHERE id = '<ACCOUNT_ID>';
# Revoke admin privileges
UPDATE "User" SET role = 'USER' WHERE id = '<USER_ID>';
# Clear sensitive data (GDPR request)
UPDATE "User" SET
email = 'deleted_' || id || '@deleted.local',
"passwordHash" = 'DELETED',
"firstName" = 'Deleted',
"lastName" = 'User',
"deletedAt" = NOW()
WHERE id = '<USER_ID>';
Secret Management
Secret Rotation Checklist
| Secret | Rotation Frequency | Impact |
|---|---|---|
| JWT_SECRET | On compromise | Invalidates all access tokens |
| JWT_REFRESH_SECRET | On compromise | Invalidates all refresh tokens |
| DATABASE_URL | Quarterly | Requires restart |
| REDIS_PASSWORD | Quarterly | Requires restart |
| TMDB_API_KEY | On compromise | Affects content metadata |
| FIREBASE_PRIVATE_KEY | On compromise | Affects social auth |
| STREAMING_SIGNING_SECRET | On compromise | Invalidates stream tokens |
| STREAMING_AES128_KEY | On compromise | Affects DRM |
Secret Rotation Procedure
# 1. Generate new secret
NEW_SECRET=$(openssl rand -base64 64 | tr -d '\n')
# 2. Update in secrets manager (or .env for dev)
# For Docker Swarm:
echo "$NEW_SECRET" | docker secret create jwt_secret_v2 -
# 3. Update service to use new secret
# In docker-compose.production.yml:
# secrets:
# - jwt_secret_v2
# 4. Rolling restart
docker service update --secret-rm jwt_secret --secret-add jwt_secret_v2 mytelevision_api
# 5. Remove old secret after grace period
docker secret rm jwt_secret
Audit & Logging
Log Locations
| Log Type | Location | Retention |
|---|---|---|
| Application | stdout/stderr (container) | 30 days |
| Nginx Access | /var/log/nginx/access.log | 90 days |
| Nginx Error | /var/log/nginx/error.log | 90 days |
| Database | /var/log/postgresql/ | 30 days |
| Redis | /var/log/redis/ | 7 days |
Key Log Patterns to Monitor
# Failed authentications
grep -E "(401|Invalid credentials|Unauthorized)" /var/log/app/*.log
# Rate limit triggers
grep "Too Many Requests" /var/log/app/*.log
# Token errors
grep -E "(token|jwt|JWT)" /var/log/app/*.log | grep -i error
# Database errors
grep -E "(prisma|database|sql)" /var/log/app/*.log | grep -i error
# Security headers issues
curl -I https://api.mytelevision.app/api/v2/health/live | grep -E "(X-|Content-Security|Strict-Transport)"
Setting Up Alerts
# Example Prometheus alert rules
groups:
- name: security
rules:
- alert: HighFailedLogins
expr: rate(auth_login_failures_total[5m]) > 10
for: 2m
labels:
severity: warning
annotations:
summary: 'High rate of failed logins'
- alert: RateLimitExceeded
expr: rate(http_requests_total{status="429"}[5m]) > 50
for: 1m
labels:
severity: warning
annotations:
summary: 'High rate of rate-limited requests'
Contact & Escalation
Escalation Matrix
| Severity | Response Time | Escalation |
|---|---|---|
| Critical (data breach) | 15 min | CTO + Legal immediately |
| High (active attack) | 30 min | Tech Lead + Security |
| Medium (suspicious activity) | 2 hours | On-call engineer |
| Low (policy violation) | 24 hours | Team lead |
Security Contacts
| Role | Contact | Availability |
|---|---|---|
| Security Lead | [email protected] | 24/7 |
| CTO | [email protected] | Business hours |
| On-Call | [email protected] | 24/7 |
| Legal | [email protected] | Business hours |
External Resources
- GitHub Security Advisories: Monitor for dependency vulnerabilities
- CVE Database: Check for new vulnerabilities
- OWASP Cheat Sheets: Reference for secure coding
- NestJS Security: https://docs.nestjs.com/security/overview
Post-Incident Actions
- Document: Complete incident report within 24 hours
- Review: Conduct post-mortem within 1 week
- Improve: Update runbooks with lessons learned
- Test: Verify fixes with security testing
- Train: Update team on new procedures if needed