195 lines
6.7 KiB
Markdown
195 lines
6.7 KiB
Markdown
|
|
# Raspberry Pi Node Scheduling Fix - Implementation Guide
|
||
|
|
|
||
|
|
## Problem Summary
|
||
|
|
Your Raspberry Pi node (4GB RAM) keeps crashing because high-resource applications are scheduling on it instead of on nodes with more capacity.
|
||
|
|
|
||
|
|
## Root Causes Identified
|
||
|
|
1. **High-memory applications without node targeting:**
|
||
|
|
- n8n PostgreSQL: 2-4Gi memory requirements
|
||
|
|
- Minecraft server: 1-4Gi memory requirements
|
||
|
|
- OpenWebUI: 1-2Gi memory requirements
|
||
|
|
- Phoenix services: 512Mi-2Gi memory requirements
|
||
|
|
- Jellyfin: 512Mi-2Gi memory requirements
|
||
|
|
|
||
|
|
2. **Missing node selectors:** Only Gitea services target ARM64 architecture
|
||
|
|
3. **No taints/tolerations:** Raspberry Pi node isn't protected from heavy workloads
|
||
|
|
4. **Resource limits missing:** Some applications can consume unlimited resources
|
||
|
|
|
||
|
|
## Solution Applied
|
||
|
|
|
||
|
|
### Modified Files with Node Selectors (Prevent RPi Scheduling)
|
||
|
|
|
||
|
|
✅ **Updated these manifests to include `nodeSelector: hardware: high-memory`:**
|
||
|
|
|
||
|
|
1. `/n8n/postgres-deployment.yaml` - PostgreSQL (2-4Gi memory)
|
||
|
|
2. `/minecraft-server/ss.yaml` - Minecraft server (1-4Gi memory)
|
||
|
|
3. `/openwebui/openwebui.yaml` - OpenWebUI (1-2Gi memory)
|
||
|
|
4. `/phoenix/phoenix-statefulset.yaml` - Phoenix app (512Mi-2Gi memory)
|
||
|
|
5. `/phoenix/postgres-statefulset.yaml` - Phoenix PostgreSQL (256Mi-1Gi memory)
|
||
|
|
6. `/jellyfin/jellyfin.yaml` - Jellyfin media server (512Mi-2Gi memory)
|
||
|
|
7. `/monitoring/prometheus-deployment.yaml` - Prometheus (512Mi-1Gi memory)
|
||
|
|
|
||
|
|
### Implementation Steps
|
||
|
|
|
||
|
|
#### Step 1: Label and Taint Your Nodes
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# 1. Identify your nodes
|
||
|
|
kubectl get nodes -o wide
|
||
|
|
|
||
|
|
# 2. Label your powerful nodes
|
||
|
|
kubectl label nodes <powerful-node-1> hardware=high-memory
|
||
|
|
kubectl label nodes <powerful-node-2> hardware=high-memory
|
||
|
|
|
||
|
|
# 3. Label your Raspberry Pi node
|
||
|
|
kubectl label nodes <raspberry-pi-node> hardware=low-memory
|
||
|
|
kubectl label nodes <raspberry-pi-node> node-type=raspberry-pi
|
||
|
|
|
||
|
|
# 4. Taint the Raspberry Pi to prevent most workloads
|
||
|
|
kubectl taint nodes <raspberry-pi-node> node-type=raspberry-pi:NoSchedule
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Step 2: Apply Updated Manifests
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Apply all updated manifests
|
||
|
|
kubectl apply -f n8n/postgres-deployment.yaml
|
||
|
|
kubectl apply -f minecraft-server/ss.yaml
|
||
|
|
kubectl apply -f openwebui/openwebui.yaml
|
||
|
|
kubectl apply -f phoenix/phoenix-statefulset.yaml
|
||
|
|
kubectl apply -f phoenix/postgres-statefulset.yaml
|
||
|
|
kubectl apply -f jellyfin/jellyfin.yaml
|
||
|
|
kubectl apply -f monitoring/prometheus-deployment.yaml
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Step 3: Force Reschedule Existing Pods
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Delete existing pods to force rescheduling on correct nodes
|
||
|
|
kubectl delete pods -n n8n -l service=postgres-n8n
|
||
|
|
kubectl delete pods -n minecraft -l app=minecraft-server
|
||
|
|
kubectl delete pods -l app=open-webui
|
||
|
|
kubectl delete pods -n phoenix -l app=phoenix
|
||
|
|
kubectl delete pods -n phoenix -l app=postgres
|
||
|
|
kubectl delete pods -n jellyfin -l app=jellyfin
|
||
|
|
kubectl delete pods -n monitoring -l app=prometheus
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Step 4: Verify Pod Scheduling
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check where pods are scheduled
|
||
|
|
kubectl get pods -o wide --all-namespaces | grep -E "(n8n|minecraft|openwebui|phoenix|jellyfin|prometheus)"
|
||
|
|
|
||
|
|
# Verify node resource usage
|
||
|
|
kubectl top nodes
|
||
|
|
|
||
|
|
# Check events for scheduling issues
|
||
|
|
kubectl get events --sort-by='.lastTimestamp' | tail -20
|
||
|
|
```
|
||
|
|
|
||
|
|
### Optional: Add Tolerations for Lightweight Services
|
||
|
|
|
||
|
|
For services that CAN run on Raspberry Pi, add tolerations:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# Example for Pi-hole (good candidate for RPi)
|
||
|
|
spec:
|
||
|
|
template:
|
||
|
|
spec:
|
||
|
|
tolerations:
|
||
|
|
- key: "node-type"
|
||
|
|
operator: "Equal"
|
||
|
|
value: "raspberry-pi"
|
||
|
|
effect: "NoSchedule"
|
||
|
|
affinity:
|
||
|
|
nodeAffinity:
|
||
|
|
preferredDuringSchedulingIgnoredDuringExecution:
|
||
|
|
- weight: 100
|
||
|
|
preference:
|
||
|
|
matchExpressions:
|
||
|
|
- key: node-type
|
||
|
|
operator: In
|
||
|
|
values: ["raspberry-pi"]
|
||
|
|
```
|
||
|
|
|
||
|
|
**Good candidates for Raspberry Pi:**
|
||
|
|
- Pi-hole (DNS filtering)
|
||
|
|
- Home Assistant (IoT hub)
|
||
|
|
- Fava (lightweight accounting)
|
||
|
|
- Vaultwarden (password manager)
|
||
|
|
- Glance (dashboard)
|
||
|
|
|
||
|
|
### Monitoring and Validation
|
||
|
|
|
||
|
|
#### Check Resource Usage
|
||
|
|
```bash
|
||
|
|
# Monitor node resource consumption
|
||
|
|
kubectl top nodes
|
||
|
|
kubectl top pods --all-namespaces --sort-by=memory
|
||
|
|
|
||
|
|
# Check pod distribution across nodes
|
||
|
|
kubectl get pods -o wide --all-namespaces | awk '{print $8}' | sort | uniq -c
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Verify Scheduling Constraints
|
||
|
|
```bash
|
||
|
|
# Check node labels and taints
|
||
|
|
kubectl get nodes --show-labels
|
||
|
|
kubectl describe nodes | grep -E "(Name:|Taints:|Labels:)"
|
||
|
|
|
||
|
|
# Verify no high-memory pods on RPi
|
||
|
|
kubectl get pods -o wide --all-namespaces | grep <raspberry-pi-node-name>
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### If Pods Stay Pending
|
||
|
|
```bash
|
||
|
|
# Check why pods can't be scheduled
|
||
|
|
kubectl describe pod <pending-pod-name> -n <namespace>
|
||
|
|
|
||
|
|
# Common issues:
|
||
|
|
# - Node doesn't have required labels
|
||
|
|
# - Resource requests too high for available nodes
|
||
|
|
# - No nodes tolerate the pod's requirements
|
||
|
|
```
|
||
|
|
|
||
|
|
### If You Need to Rollback
|
||
|
|
```bash
|
||
|
|
# Remove node selectors from manifests and reapply
|
||
|
|
# Remove taints from Raspberry Pi
|
||
|
|
kubectl taint nodes <raspberry-pi-node> node-type=raspberry-pi:NoSchedule-
|
||
|
|
|
||
|
|
# Remove labels if needed
|
||
|
|
kubectl label nodes <node-name> hardware-
|
||
|
|
kubectl label nodes <node-name> node-type-
|
||
|
|
```
|
||
|
|
|
||
|
|
## Expected Results
|
||
|
|
|
||
|
|
After implementation:
|
||
|
|
1. **High-resource applications** will only schedule on powerful nodes
|
||
|
|
2. **Raspberry Pi node** will be protected from resource-heavy workloads
|
||
|
|
3. **Cluster stability** will improve with proper resource distribution
|
||
|
|
4. **Pi node crashes** should stop occurring
|
||
|
|
5. **Lightweight services** can still run on Pi (with tolerations)
|
||
|
|
|
||
|
|
## Architecture Summary
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||
|
|
│ Powerful │ │ Powerful │ │ Raspberry Pi │
|
||
|
|
│ Node 1 │ │ Node 2 │ │ Node (4GB) │
|
||
|
|
│ │ │ │ │ │
|
||
|
|
│ • n8n Postgres │ │ • Minecraft │ │ • Pi-hole │
|
||
|
|
│ • Phoenix │ │ • OpenWebUI │ │ • Glance │
|
||
|
|
│ • Jellyfin │ │ • Prometheus │ │ • Fava │
|
||
|
|
│ • Grafana │ │ • Other apps │ │ • Vaultwarden │
|
||
|
|
│ │ │ │ │ • Home Asst │
|
||
|
|
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||
|
|
hardware=high-mem hardware=high-mem hardware=low-mem
|
||
|
|
TAINTED (protected)
|
||
|
|
```
|
||
|
|
|
||
|
|
The Raspberry Pi is now protected while still being available for lightweight services that benefit from its unique characteristics.
|