A server without monitoring is a blind server. You don’t know when the disk fills up, which container is consuming too much RAM, or how many 404 requests your web is generating. This article documents how I configured the complete stack: Prometheus + Node Exporter + Grafana + Loki + Promtail.
The architecture#
[Servidor doméstico]
├── node-exporter → métricas del sistema (CPU, RAM, disco, red)
├── docker-stats- → métricas de contenedores (textfile collector)
│ collector
├── prometheus → recolecta y almacena métricas
├── loki → agrega y almacena logs
├── promtail → envía logs de Nginx y syslog a Loki
└── grafana → dashboards de todo lo anteriorAll services run in Docker, coordinated by the same docker-compose.yml.
System metrics: Node Exporter#
Node Exporter exposes hardware and OS metrics. The trick: it has to run with network_mode: host to see the actual server network interfaces. If it runs on Docker network, it only sees the container’s eth0 interface.
node-exporter:
image: prom/node-exporter:v1.8.2
network_mode: host
pid: host
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
- ./textfile-collector:/textfile:ro
command:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/rootfs
- --web.listen-address=127.0.0.1:9100
- --collector.textfile.directory=/textfileIt listens on 127.0.0.1:9100. Prometheus reaches it via 172.17.0.1:9100 (the host’s IP from the Docker network).
Container metrics: docker stats + textfile collector#
The problem with cAdvisor is that it doesn’t work with Docker 29 and the overlayfs storage driver on cgroupv2 — it fails with “failed to identify read-write layer ID”.
The solution: a lightweight container that runs docker stats every 30 seconds and writes the result in Prometheus format to a file that Node Exporter reads.
#!/bin/bash
# docker_stats.sh
OUTFILE="/textfile/docker_stats.prom"
TMPFILE="${OUTFILE}.tmp"
{
echo "# HELP docker_container_cpu_percent CPU usage percentage per container"
echo "# TYPE docker_container_cpu_percent gauge"
# ... más definiciones ...
docker stats --no-stream --format \
'{{.Name}}|{{.CPUPerc}}|{{.MemUsage}}|{{.NetIO}}' 2>/dev/null | \
while IFS='|' read -r name cpu mem net; do
cpu_val=$(echo "$cpu" | tr -d '%' | tr ',' '.')
# ... conversión de unidades ...
echo "docker_container_cpu_percent{name=\"${name}\"} ${cpu_val}"
echo "docker_container_memory_bytes{name=\"${name}\"} ${mem_used_bytes}"
echo "docker_container_running{name=\"${name}\"} 1"
done
# Contenedores parados
docker ps -a --filter "status=exited" --format '{{.Names}}' 2>/dev/null | \
while read -r name; do
echo "docker_container_running{name=\"${name}\"} 0"
done
} > "$TMPFILE" && mv "$TMPFILE" "$OUTFILE"Atomic writes (tmp → final) prevent Prometheus from reading a partially-written file.
docker-stats-collector:
image: docker:27-cli
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./textfile-collector:/textfile
- ./docker_stats.sh:/docker_stats.sh:ro
entrypoint: sh -c "apk add --no-cache bc > /dev/null 2>&1; while true; do sh /docker_stats.sh; sleep 30; done"Prometheus: collect and retain#
prometheus:
image: prom/prometheus:v2.51.2
networks:
- monitoring
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus-data:/prometheus
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention.time=30d
- --web.enable-lifecycleScraping configuration:
global:
scrape_interval: 15s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: [localhost:9090]
- job_name: node
static_configs:
- targets: [172.17.0.1:9100]
relabel_configs:
- target_label: host
replacement: servidor-casa172.17.0.1 is the host’s IP accessible from the Docker bridge network. Data is retained for 30 days.
Logs: Loki + Promtail#
Loki stores logs without indexing the full content — only the labels. Promtail collects them and sends them with labels like job, host, filename.
promtail:
image: grafana/promtail:3.3.2
user: root
networks:
- monitoring
volumes:
- ./promtail-config.yml:/etc/promtail/config.yml:ro
- ./promtail-data:/tmp/promtail
- ~/infra/web/logs:/logs/nginx:ro
- /var/log:/logs/host:roIt needs to run as root to read /var/log.
Grafana: dashboards#
Grafana connects to Prometheus and Loki as data sources. The most useful dashboards:
System (Node Exporter):
- Total CPU and per-core
- RAM used / free / cache
- Disk: usage per partition, IOPS, throughput
- Network: inbound/outbound traffic per interface
Containers (docker stats):
- CPU % per container
- RAM per container vs limit
- State (running/stopped)
- Network traffic per container
Logs (Loki):
- Nginx logs in real-time
- Requests by status code (200, 301, 404, 500)
- Top IPs with most requests
- Top most-accessed routes
Issue: [$__range] in Loki instant queries#
When using “stat” or “piechart” panels with Loki, the [$__range] variable doesn’t resolve — Grafana returns “empty duration string”. The solution is to use a fixed duration:
# MAL (en paneles stat/piechart):
sum by(status) (count_over_time({job="nginx"} | pattern ... [$__range]))
# BIEN:
sum by(status) (count_over_time({job="nginx"} | pattern ... [24h]))“Time series” panels do support [$__interval] correctly.
Stack security#
- Prometheus and Loki have no external access — only on the internal
monitoringnetwork - Grafana is the only access point, protected with Traefik and Let’s Encrypt
GF_AUTH_ANONYMOUS_ENABLED=falseandGF_USERS_ALLOW_SIGN_UP=falsein Grafana- Node Exporter listens only on
127.0.0.1, not exposed on all interfaces
Result#
With this stack you have complete visibility of the server: which processes consume resources, which containers fail, which requests your web receives and what errors it generates. Everything in dashboards accessible from monitor.serviciosrogeliowar.com.
Recommended Equipment#
- Raspberry Pi 3 B+ — Lightweight, low-power server to start your homelab
- Raspberry Pi 4 (4GB) — The perfect foundation for homelab, Docker and monitoring
Affiliate links. No extra cost to you.