The Problem#
After spending months running containers on my home server, I got tired of discovering issues when things were already broken. A container consuming all the memory. A volume full with no warning. I needed real visibility into what was happening in my infrastructure.
I decided to implement a monitoring stack with Prometheus and Grafana. Here I document exactly how I did it.
Architecture Chosen#
- Prometheus: collects metrics from Docker
- cAdvisor: exposes container metrics
- Grafana: visualizes everything in dashboards
- Alertmanager: notifies when something fails
Step 1: Docker Compose with the complete stack#
I created a docker-compose.yml file that brings everything up together:
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alertas.yml:/etc/prometheus/alertas.yml
- prometheus_data:/prometheus
ports:
- "9090:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
networks:
- monitoring
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
ports:
- "8080:8080"
networks:
- monitoring
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
networks:
- monitoring
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
- alertmanager_data:/alertmanager
ports:
- "9093:9093"
networks:
- monitoring
volumes:
prometheus_data:
grafana_data:
alertmanager_data:
networks:
monitoring:
driver: bridgeStep 2: Configure Prometheus#
File prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- '/etc/prometheus/alertas.yml'
scrape_configs:
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']Step 3: Define the alerts#
File alertas.yml:
groups:
- name: docker_alerts
interval: 10s
rules:
- alert: HighCPUUsage
expr: 'rate(container_cpu_usage_seconds_total[5m]) > 0.8'
for: 2m
annotations:
summary: "CPU alta en contenedor {{ $labels.name }}"
description: "{{ $labels.name }} está usando {{ $value | humanizePercentage }} de CPU"
- alert: HighMemoryUsage
expr: 'container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85'
for: 2m
annotations:
summary: "Memoria alta en {{ $labels.name }}"
description: "Uso de memoria: {{ $value | humanizePercentage }}"
- alert: DiskSpaceRunningOut
expr: 'node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} < 0.1'
for: 5m
annotations:
summary: "Espacio en disco por debajo del 10%"Step 4: Configure Alertmanager#
File alertmanager.yml:
global:
resolve_timeout: 5m
route:
receiver: 'console'
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receivers:
- name: 'console'
webhook_configs:
- url: 'http://localhost:5001/'Step 5: Start and verify#
docker-compose up -dAccess:
- Prometheus:
http://localhost:9090 - Grafana:
http://localhost:3000 - cAdvisor:
http://localhost:8080
Step 6: Create dashboards in Grafana#
In Grafana I imported the public dashboard 893 (Docker and Host Monitoring) which works directly with cAdvisor.
Result#
Now I have complete visibility. I receive alerts when:
- A container consumes more than 80% CPU for 2 minutes
- Memory exceeds 85% of the limit
- Disk drops below 10%
The complete setup takes up less than 500MB of RAM at rest and has already saved me several scares. It’s worth it.
Recommended Equipment#
- Raspberry Pi 3 B+ — Lightweight, low-power server to start your homelab
- Raspberry Pi 4 (4GB) — The perfect foundation for homelab, Docker and monitoring
Affiliate links. No extra cost to you.