6.7 KiB
6.7 KiB
Raspberry Pi Node Scheduling Fix - Implementation Guide
Problem Summary
Your Raspberry Pi node (4GB RAM) keeps crashing because high-resource applications are scheduling on it instead of on nodes with more capacity.
Root Causes Identified
-
High-memory applications without node targeting:
- n8n PostgreSQL: 2-4Gi memory requirements
- Minecraft server: 1-4Gi memory requirements
- OpenWebUI: 1-2Gi memory requirements
- Phoenix services: 512Mi-2Gi memory requirements
- Jellyfin: 512Mi-2Gi memory requirements
-
Missing node selectors: Only Gitea services target ARM64 architecture
-
No taints/tolerations: Raspberry Pi node isn't protected from heavy workloads
-
Resource limits missing: Some applications can consume unlimited resources
Solution Applied
Modified Files with Node Selectors (Prevent RPi Scheduling)
✅ Updated these manifests to include nodeSelector: hardware: high-memory:
/n8n/postgres-deployment.yaml- PostgreSQL (2-4Gi memory)/minecraft-server/ss.yaml- Minecraft server (1-4Gi memory)/openwebui/openwebui.yaml- OpenWebUI (1-2Gi memory)/phoenix/phoenix-statefulset.yaml- Phoenix app (512Mi-2Gi memory)/phoenix/postgres-statefulset.yaml- Phoenix PostgreSQL (256Mi-1Gi memory)/jellyfin/jellyfin.yaml- Jellyfin media server (512Mi-2Gi memory)/monitoring/prometheus-deployment.yaml- Prometheus (512Mi-1Gi memory)
Implementation Steps
Step 1: Label and Taint Your Nodes
# 1. Identify your nodes
kubectl get nodes -o wide
# 2. Label your powerful nodes
kubectl label nodes <powerful-node-1> hardware=high-memory
kubectl label nodes <powerful-node-2> hardware=high-memory
# 3. Label your Raspberry Pi node
kubectl label nodes <raspberry-pi-node> hardware=low-memory
kubectl label nodes <raspberry-pi-node> node-type=raspberry-pi
# 4. Taint the Raspberry Pi to prevent most workloads
kubectl taint nodes <raspberry-pi-node> node-type=raspberry-pi:NoSchedule
Step 2: Apply Updated Manifests
# Apply all updated manifests
kubectl apply -f n8n/postgres-deployment.yaml
kubectl apply -f minecraft-server/ss.yaml
kubectl apply -f openwebui/openwebui.yaml
kubectl apply -f phoenix/phoenix-statefulset.yaml
kubectl apply -f phoenix/postgres-statefulset.yaml
kubectl apply -f jellyfin/jellyfin.yaml
kubectl apply -f monitoring/prometheus-deployment.yaml
Step 3: Force Reschedule Existing Pods
# Delete existing pods to force rescheduling on correct nodes
kubectl delete pods -n n8n -l service=postgres-n8n
kubectl delete pods -n minecraft -l app=minecraft-server
kubectl delete pods -l app=open-webui
kubectl delete pods -n phoenix -l app=phoenix
kubectl delete pods -n phoenix -l app=postgres
kubectl delete pods -n jellyfin -l app=jellyfin
kubectl delete pods -n monitoring -l app=prometheus
Step 4: Verify Pod Scheduling
# Check where pods are scheduled
kubectl get pods -o wide --all-namespaces | grep -E "(n8n|minecraft|openwebui|phoenix|jellyfin|prometheus)"
# Verify node resource usage
kubectl top nodes
# Check events for scheduling issues
kubectl get events --sort-by='.lastTimestamp' | tail -20
Optional: Add Tolerations for Lightweight Services
For services that CAN run on Raspberry Pi, add tolerations:
# Example for Pi-hole (good candidate for RPi)
spec:
template:
spec:
tolerations:
- key: "node-type"
operator: "Equal"
value: "raspberry-pi"
effect: "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-type
operator: In
values: ["raspberry-pi"]
Good candidates for Raspberry Pi:
- Pi-hole (DNS filtering)
- Home Assistant (IoT hub)
- Fava (lightweight accounting)
- Vaultwarden (password manager)
- Glance (dashboard)
Monitoring and Validation
Check Resource Usage
# Monitor node resource consumption
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=memory
# Check pod distribution across nodes
kubectl get pods -o wide --all-namespaces | awk '{print $8}' | sort | uniq -c
Verify Scheduling Constraints
# Check node labels and taints
kubectl get nodes --show-labels
kubectl describe nodes | grep -E "(Name:|Taints:|Labels:)"
# Verify no high-memory pods on RPi
kubectl get pods -o wide --all-namespaces | grep <raspberry-pi-node-name>
Troubleshooting
If Pods Stay Pending
# Check why pods can't be scheduled
kubectl describe pod <pending-pod-name> -n <namespace>
# Common issues:
# - Node doesn't have required labels
# - Resource requests too high for available nodes
# - No nodes tolerate the pod's requirements
If You Need to Rollback
# Remove node selectors from manifests and reapply
# Remove taints from Raspberry Pi
kubectl taint nodes <raspberry-pi-node> node-type=raspberry-pi:NoSchedule-
# Remove labels if needed
kubectl label nodes <node-name> hardware-
kubectl label nodes <node-name> node-type-
Expected Results
After implementation:
- High-resource applications will only schedule on powerful nodes
- Raspberry Pi node will be protected from resource-heavy workloads
- Cluster stability will improve with proper resource distribution
- Pi node crashes should stop occurring
- Lightweight services can still run on Pi (with tolerations)
Architecture Summary
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Powerful │ │ Powerful │ │ Raspberry Pi │
│ Node 1 │ │ Node 2 │ │ Node (4GB) │
│ │ │ │ │ │
│ • n8n Postgres │ │ • Minecraft │ │ • Pi-hole │
│ • Phoenix │ │ • OpenWebUI │ │ • Glance │
│ • Jellyfin │ │ • Prometheus │ │ • Fava │
│ • Grafana │ │ • Other apps │ │ • Vaultwarden │
│ │ │ │ │ • Home Asst │
└─────────────────┘ └─────────────────┘ └─────────────────┘
hardware=high-mem hardware=high-mem hardware=low-mem
TAINTED (protected)
The Raspberry Pi is now protected while still being available for lightweight services that benefit from its unique characteristics.