Files
k3s-cluster/RASPBERRY_PI_SCHEDULING_FIX.md

195 lines
6.7 KiB
Markdown
Raw Permalink Normal View History

2026-02-02 20:47:09 +01:00
# Raspberry Pi Node Scheduling Fix - Implementation Guide
## Problem Summary
Your Raspberry Pi node (4GB RAM) keeps crashing because high-resource applications are scheduling on it instead of on nodes with more capacity.
## Root Causes Identified
1. **High-memory applications without node targeting:**
- n8n PostgreSQL: 2-4Gi memory requirements
- Minecraft server: 1-4Gi memory requirements
- OpenWebUI: 1-2Gi memory requirements
- Phoenix services: 512Mi-2Gi memory requirements
- Jellyfin: 512Mi-2Gi memory requirements
2. **Missing node selectors:** Only Gitea services target ARM64 architecture
3. **No taints/tolerations:** Raspberry Pi node isn't protected from heavy workloads
4. **Resource limits missing:** Some applications can consume unlimited resources
## Solution Applied
### Modified Files with Node Selectors (Prevent RPi Scheduling)
**Updated these manifests to include `nodeSelector: hardware: high-memory`:**
1. `/n8n/postgres-deployment.yaml` - PostgreSQL (2-4Gi memory)
2. `/minecraft-server/ss.yaml` - Minecraft server (1-4Gi memory)
3. `/openwebui/openwebui.yaml` - OpenWebUI (1-2Gi memory)
4. `/phoenix/phoenix-statefulset.yaml` - Phoenix app (512Mi-2Gi memory)
5. `/phoenix/postgres-statefulset.yaml` - Phoenix PostgreSQL (256Mi-1Gi memory)
6. `/jellyfin/jellyfin.yaml` - Jellyfin media server (512Mi-2Gi memory)
7. `/monitoring/prometheus-deployment.yaml` - Prometheus (512Mi-1Gi memory)
### Implementation Steps
#### Step 1: Label and Taint Your Nodes
```bash
# 1. Identify your nodes
kubectl get nodes -o wide
# 2. Label your powerful nodes
kubectl label nodes <powerful-node-1> hardware=high-memory
kubectl label nodes <powerful-node-2> hardware=high-memory
# 3. Label your Raspberry Pi node
kubectl label nodes <raspberry-pi-node> hardware=low-memory
kubectl label nodes <raspberry-pi-node> node-type=raspberry-pi
# 4. Taint the Raspberry Pi to prevent most workloads
kubectl taint nodes <raspberry-pi-node> node-type=raspberry-pi:NoSchedule
```
#### Step 2: Apply Updated Manifests
```bash
# Apply all updated manifests
kubectl apply -f n8n/postgres-deployment.yaml
kubectl apply -f minecraft-server/ss.yaml
kubectl apply -f openwebui/openwebui.yaml
kubectl apply -f phoenix/phoenix-statefulset.yaml
kubectl apply -f phoenix/postgres-statefulset.yaml
kubectl apply -f jellyfin/jellyfin.yaml
kubectl apply -f monitoring/prometheus-deployment.yaml
```
#### Step 3: Force Reschedule Existing Pods
```bash
# Delete existing pods to force rescheduling on correct nodes
kubectl delete pods -n n8n -l service=postgres-n8n
kubectl delete pods -n minecraft -l app=minecraft-server
kubectl delete pods -l app=open-webui
kubectl delete pods -n phoenix -l app=phoenix
kubectl delete pods -n phoenix -l app=postgres
kubectl delete pods -n jellyfin -l app=jellyfin
kubectl delete pods -n monitoring -l app=prometheus
```
#### Step 4: Verify Pod Scheduling
```bash
# Check where pods are scheduled
kubectl get pods -o wide --all-namespaces | grep -E "(n8n|minecraft|openwebui|phoenix|jellyfin|prometheus)"
# Verify node resource usage
kubectl top nodes
# Check events for scheduling issues
kubectl get events --sort-by='.lastTimestamp' | tail -20
```
### Optional: Add Tolerations for Lightweight Services
For services that CAN run on Raspberry Pi, add tolerations:
```yaml
# Example for Pi-hole (good candidate for RPi)
spec:
template:
spec:
tolerations:
- key: "node-type"
operator: "Equal"
value: "raspberry-pi"
effect: "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-type
operator: In
values: ["raspberry-pi"]
```
**Good candidates for Raspberry Pi:**
- Pi-hole (DNS filtering)
- Home Assistant (IoT hub)
- Fava (lightweight accounting)
- Vaultwarden (password manager)
- Glance (dashboard)
### Monitoring and Validation
#### Check Resource Usage
```bash
# Monitor node resource consumption
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=memory
# Check pod distribution across nodes
kubectl get pods -o wide --all-namespaces | awk '{print $8}' | sort | uniq -c
```
#### Verify Scheduling Constraints
```bash
# Check node labels and taints
kubectl get nodes --show-labels
kubectl describe nodes | grep -E "(Name:|Taints:|Labels:)"
# Verify no high-memory pods on RPi
kubectl get pods -o wide --all-namespaces | grep <raspberry-pi-node-name>
```
## Troubleshooting
### If Pods Stay Pending
```bash
# Check why pods can't be scheduled
kubectl describe pod <pending-pod-name> -n <namespace>
# Common issues:
# - Node doesn't have required labels
# - Resource requests too high for available nodes
# - No nodes tolerate the pod's requirements
```
### If You Need to Rollback
```bash
# Remove node selectors from manifests and reapply
# Remove taints from Raspberry Pi
kubectl taint nodes <raspberry-pi-node> node-type=raspberry-pi:NoSchedule-
# Remove labels if needed
kubectl label nodes <node-name> hardware-
kubectl label nodes <node-name> node-type-
```
## Expected Results
After implementation:
1. **High-resource applications** will only schedule on powerful nodes
2. **Raspberry Pi node** will be protected from resource-heavy workloads
3. **Cluster stability** will improve with proper resource distribution
4. **Pi node crashes** should stop occurring
5. **Lightweight services** can still run on Pi (with tolerations)
## Architecture Summary
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Powerful │ │ Powerful │ │ Raspberry Pi │
│ Node 1 │ │ Node 2 │ │ Node (4GB) │
│ │ │ │ │ │
│ • n8n Postgres │ │ • Minecraft │ │ • Pi-hole │
│ • Phoenix │ │ • OpenWebUI │ │ • Glance │
│ • Jellyfin │ │ • Prometheus │ │ • Fava │
│ • Grafana │ │ • Other apps │ │ • Vaultwarden │
│ │ │ │ │ • Home Asst │
└─────────────────┘ └─────────────────┘ └─────────────────┘
hardware=high-mem hardware=high-mem hardware=low-mem
TAINTED (protected)
```
The Raspberry Pi is now protected while still being available for lightweight services that benefit from its unique characteristics.