Skip to main content

Monitoring Docker containers with Prometheus and Grafana: automatic alerts at home

Rogelio Guerra Riverón
Author
Rogelio Guerra Riverón
Building my own web infrastructure from scratch. Here I document each step: servers, networks, containers and everything that comes along.

The Problem
#

After spending months running containers on my home server, I got tired of discovering issues when things were already broken. A container consuming all the memory. A volume full with no warning. I needed real visibility into what was happening in my infrastructure.

I decided to implement a monitoring stack with Prometheus and Grafana. Here I document exactly how I did it.

Architecture Chosen
#

  • Prometheus: collects metrics from Docker
  • cAdvisor: exposes container metrics
  • Grafana: visualizes everything in dashboards
  • Alertmanager: notifies when something fails

Step 1: Docker Compose with the complete stack
#

I created a docker-compose.yml file that brings everything up together:

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./alertas.yml:/etc/prometheus/alertas.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - "8080:8080"
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    networks:
      - monitoring

  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
      - alertmanager_data:/alertmanager
    ports:
      - "9093:9093"
    networks:
      - monitoring

volumes:
  prometheus_data:
  grafana_data:
  alertmanager_data:

networks:
  monitoring:
    driver: bridge

Step 2: Configure Prometheus
#

File prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

rule_files:
  - '/etc/prometheus/alertas.yml'

scrape_configs:
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Step 3: Define the alerts
#

File alertas.yml:

groups:
  - name: docker_alerts
    interval: 10s
    rules:
      - alert: HighCPUUsage
        expr: 'rate(container_cpu_usage_seconds_total[5m]) > 0.8'
        for: 2m
        annotations:
          summary: "CPU alta en contenedor {{ $labels.name }}"
          description: "{{ $labels.name }} está usando {{ $value | humanizePercentage }} de CPU"

      - alert: HighMemoryUsage
        expr: 'container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85'
        for: 2m
        annotations:
          summary: "Memoria alta en {{ $labels.name }}"
          description: "Uso de memoria: {{ $value | humanizePercentage }}"

      - alert: DiskSpaceRunningOut
        expr: 'node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} < 0.1'
        for: 5m
        annotations:
          summary: "Espacio en disco por debajo del 10%"

Step 4: Configure Alertmanager
#

File alertmanager.yml:

global:
  resolve_timeout: 5m

route:
  receiver: 'console'
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h

receivers:
  - name: 'console'
    webhook_configs:
      - url: 'http://localhost:5001/'

Step 5: Start and verify
#

docker-compose up -d

Access:

  • Prometheus: http://localhost:9090
  • Grafana: http://localhost:3000
  • cAdvisor: http://localhost:8080

Step 6: Create dashboards in Grafana
#

In Grafana I imported the public dashboard 893 (Docker and Host Monitoring) which works directly with cAdvisor.

Result
#

Now I have complete visibility. I receive alerts when:

  • A container consumes more than 80% CPU for 2 minutes
  • Memory exceeds 85% of the limit
  • Disk drops below 10%

The complete setup takes up less than 500MB of RAM at rest and has already saved me several scares. It’s worth it.


Recommended Equipment#

Affiliate links. No extra cost to you.