diff --git a/RASPBERRY_PI_SCHEDULING_FIX.md b/RASPBERRY_PI_SCHEDULING_FIX.md new file mode 100644 index 0000000..08b6513 --- /dev/null +++ b/RASPBERRY_PI_SCHEDULING_FIX.md @@ -0,0 +1,195 @@ +# Raspberry Pi Node Scheduling Fix - Implementation Guide + +## Problem Summary +Your Raspberry Pi node (4GB RAM) keeps crashing because high-resource applications are scheduling on it instead of on nodes with more capacity. + +## Root Causes Identified +1. **High-memory applications without node targeting:** + - n8n PostgreSQL: 2-4Gi memory requirements + - Minecraft server: 1-4Gi memory requirements + - OpenWebUI: 1-2Gi memory requirements + - Phoenix services: 512Mi-2Gi memory requirements + - Jellyfin: 512Mi-2Gi memory requirements + +2. **Missing node selectors:** Only Gitea services target ARM64 architecture +3. **No taints/tolerations:** Raspberry Pi node isn't protected from heavy workloads +4. **Resource limits missing:** Some applications can consume unlimited resources + +## Solution Applied + +### Modified Files with Node Selectors (Prevent RPi Scheduling) + +✅ **Updated these manifests to include `nodeSelector: hardware: high-memory`:** + +1. `/n8n/postgres-deployment.yaml` - PostgreSQL (2-4Gi memory) +2. `/minecraft-server/ss.yaml` - Minecraft server (1-4Gi memory) +3. `/openwebui/openwebui.yaml` - OpenWebUI (1-2Gi memory) +4. `/phoenix/phoenix-statefulset.yaml` - Phoenix app (512Mi-2Gi memory) +5. `/phoenix/postgres-statefulset.yaml` - Phoenix PostgreSQL (256Mi-1Gi memory) +6. `/jellyfin/jellyfin.yaml` - Jellyfin media server (512Mi-2Gi memory) +7. `/monitoring/prometheus-deployment.yaml` - Prometheus (512Mi-1Gi memory) + +### Implementation Steps + +#### Step 1: Label and Taint Your Nodes + +```bash +# 1. Identify your nodes +kubectl get nodes -o wide + +# 2. Label your powerful nodes +kubectl label nodes hardware=high-memory +kubectl label nodes hardware=high-memory + +# 3. Label your Raspberry Pi node +kubectl label nodes hardware=low-memory +kubectl label nodes node-type=raspberry-pi + +# 4. Taint the Raspberry Pi to prevent most workloads +kubectl taint nodes node-type=raspberry-pi:NoSchedule +``` + +#### Step 2: Apply Updated Manifests + +```bash +# Apply all updated manifests +kubectl apply -f n8n/postgres-deployment.yaml +kubectl apply -f minecraft-server/ss.yaml +kubectl apply -f openwebui/openwebui.yaml +kubectl apply -f phoenix/phoenix-statefulset.yaml +kubectl apply -f phoenix/postgres-statefulset.yaml +kubectl apply -f jellyfin/jellyfin.yaml +kubectl apply -f monitoring/prometheus-deployment.yaml +``` + +#### Step 3: Force Reschedule Existing Pods + +```bash +# Delete existing pods to force rescheduling on correct nodes +kubectl delete pods -n n8n -l service=postgres-n8n +kubectl delete pods -n minecraft -l app=minecraft-server +kubectl delete pods -l app=open-webui +kubectl delete pods -n phoenix -l app=phoenix +kubectl delete pods -n phoenix -l app=postgres +kubectl delete pods -n jellyfin -l app=jellyfin +kubectl delete pods -n monitoring -l app=prometheus +``` + +#### Step 4: Verify Pod Scheduling + +```bash +# Check where pods are scheduled +kubectl get pods -o wide --all-namespaces | grep -E "(n8n|minecraft|openwebui|phoenix|jellyfin|prometheus)" + +# Verify node resource usage +kubectl top nodes + +# Check events for scheduling issues +kubectl get events --sort-by='.lastTimestamp' | tail -20 +``` + +### Optional: Add Tolerations for Lightweight Services + +For services that CAN run on Raspberry Pi, add tolerations: + +```yaml +# Example for Pi-hole (good candidate for RPi) +spec: + template: + spec: + tolerations: + - key: "node-type" + operator: "Equal" + value: "raspberry-pi" + effect: "NoSchedule" + affinity: + nodeAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 100 + preference: + matchExpressions: + - key: node-type + operator: In + values: ["raspberry-pi"] +``` + +**Good candidates for Raspberry Pi:** +- Pi-hole (DNS filtering) +- Home Assistant (IoT hub) +- Fava (lightweight accounting) +- Vaultwarden (password manager) +- Glance (dashboard) + +### Monitoring and Validation + +#### Check Resource Usage +```bash +# Monitor node resource consumption +kubectl top nodes +kubectl top pods --all-namespaces --sort-by=memory + +# Check pod distribution across nodes +kubectl get pods -o wide --all-namespaces | awk '{print $8}' | sort | uniq -c +``` + +#### Verify Scheduling Constraints +```bash +# Check node labels and taints +kubectl get nodes --show-labels +kubectl describe nodes | grep -E "(Name:|Taints:|Labels:)" + +# Verify no high-memory pods on RPi +kubectl get pods -o wide --all-namespaces | grep +``` + +## Troubleshooting + +### If Pods Stay Pending +```bash +# Check why pods can't be scheduled +kubectl describe pod -n + +# Common issues: +# - Node doesn't have required labels +# - Resource requests too high for available nodes +# - No nodes tolerate the pod's requirements +``` + +### If You Need to Rollback +```bash +# Remove node selectors from manifests and reapply +# Remove taints from Raspberry Pi +kubectl taint nodes node-type=raspberry-pi:NoSchedule- + +# Remove labels if needed +kubectl label nodes hardware- +kubectl label nodes node-type- +``` + +## Expected Results + +After implementation: +1. **High-resource applications** will only schedule on powerful nodes +2. **Raspberry Pi node** will be protected from resource-heavy workloads +3. **Cluster stability** will improve with proper resource distribution +4. **Pi node crashes** should stop occurring +5. **Lightweight services** can still run on Pi (with tolerations) + +## Architecture Summary + +``` +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ +│ Powerful │ │ Powerful │ │ Raspberry Pi │ +│ Node 1 │ │ Node 2 │ │ Node (4GB) │ +│ │ │ │ │ │ +│ • n8n Postgres │ │ • Minecraft │ │ • Pi-hole │ +│ • Phoenix │ │ • OpenWebUI │ │ • Glance │ +│ • Jellyfin │ │ • Prometheus │ │ • Fava │ +│ • Grafana │ │ • Other apps │ │ • Vaultwarden │ +│ │ │ │ │ • Home Asst │ +└─────────────────┘ └─────────────────┘ └─────────────────┘ + hardware=high-mem hardware=high-mem hardware=low-mem + TAINTED (protected) +``` + +The Raspberry Pi is now protected while still being available for lightweight services that benefit from its unique characteristics. \ No newline at end of file diff --git a/jellyfin/jellyfin.yaml b/jellyfin/jellyfin.yaml index ef6e38e..587394e 100644 --- a/jellyfin/jellyfin.yaml +++ b/jellyfin/jellyfin.yaml @@ -38,6 +38,9 @@ spec: labels: app: jellyfin spec: + # Prevent scheduling on Raspberry Pi due to high resource requirements (512Mi-2Gi memory, 500m-2000m CPU) + nodeSelector: + hardware: high-memory containers: - name: jellyfin image: jellyfin/jellyfin:latest diff --git a/minecraft-server/ss.yaml b/minecraft-server/ss.yaml index 625548e..ed9c295 100644 --- a/minecraft-server/ss.yaml +++ b/minecraft-server/ss.yaml @@ -13,6 +13,9 @@ spec: labels: app: minecraft-server spec: + # Prevent scheduling on Raspberry Pi due to high resource requirements (1Gi-4Gi memory, 1-2 CPU) + nodeSelector: + hardware: high-memory containers: - name: minecraft-server image: itzg/minecraft-server:latest # Or specific version if needed diff --git a/monitoring/prometheus-deployment.yaml b/monitoring/prometheus-deployment.yaml index 456ed03..825ecf1 100644 --- a/monitoring/prometheus-deployment.yaml +++ b/monitoring/prometheus-deployment.yaml @@ -15,6 +15,9 @@ spec: labels: app: prometheus spec: + # Prevent scheduling on Raspberry Pi due to resource requirements (512Mi-1Gi memory, 500m-1000m CPU) + nodeSelector: + hardware: high-memory serviceAccountName: prometheus containers: - name: prometheus diff --git a/n8n/postgres-deployment.yaml b/n8n/postgres-deployment.yaml index ceb65fe..d28d9a2 100644 --- a/n8n/postgres-deployment.yaml +++ b/n8n/postgres-deployment.yaml @@ -20,6 +20,9 @@ spec: labels: service: postgres-n8n spec: + # Prevent scheduling on Raspberry Pi due to high memory requirements (2-4Gi) + nodeSelector: + hardware: high-memory containers: - image: postgres:18 name: postgres diff --git a/node-management-commands.md b/node-management-commands.md new file mode 100644 index 0000000..0e892e1 --- /dev/null +++ b/node-management-commands.md @@ -0,0 +1,45 @@ +# Node Management Commands for Raspberry Pi Scheduling Issues + +## 1. Taint the Raspberry Pi Node (Recommended Approach) + +```bash +# Find your Raspberry Pi node name +kubectl get nodes -o wide + +# Taint the Raspberry Pi node to prevent scheduling (except for tolerating pods) +kubectl taint nodes node-type=raspberry-pi:NoSchedule + +# Alternative: Use a more descriptive taint +kubectl taint nodes hardware=low-memory:NoSchedule +``` + +## 2. Label Nodes for Better Targeting + +```bash +# Label your Raspberry Pi node +kubectl label nodes node-type=raspberry-pi +kubectl label nodes hardware=low-memory + +# Label your more powerful nodes +kubectl label nodes node-type=worker +kubectl label nodes hardware=high-memory +kubectl label nodes node-type=worker +kubectl label nodes hardware=high-memory +``` + +## 3. Verify Node Configuration + +```bash +# Check node labels and taints +kubectl describe nodes + +# See which nodes have what resources available +kubectl describe nodes | grep -A 5 "Allocatable" +``` + +## 4. Remove Taint if Needed + +```bash +# Remove the taint if you need to rollback +kubectl taint nodes node-type=raspberry-pi:NoSchedule- +``` \ No newline at end of file diff --git a/openwebui/openwebui.yaml b/openwebui/openwebui.yaml index 4ed31f6..9ef2901 100644 --- a/openwebui/openwebui.yaml +++ b/openwebui/openwebui.yaml @@ -24,7 +24,10 @@ spec: metadata: labels: app: open-webui - spec: + spec: + # Prevent scheduling on Raspberry Pi due to high resource requirements (1Gi-2Gi memory, 1-2 CPU) + nodeSelector: + hardware: high-memory volumes: - name: webui-data persistentVolumeClaim: diff --git a/phoenix/phoenix-statefulset.yaml b/phoenix/phoenix-statefulset.yaml index 45e88f9..2415623 100644 --- a/phoenix/phoenix-statefulset.yaml +++ b/phoenix/phoenix-statefulset.yaml @@ -42,6 +42,9 @@ spec: labels: app: phoenix spec: + # Prevent scheduling on Raspberry Pi due to high resource requirements (512Mi-2Gi memory, 500m-2000m CPU) + nodeSelector: + hardware: high-memory initContainers: - name: wait-for-postgres image: busybox:1.36 diff --git a/phoenix/postgres-statefulset.yaml b/phoenix/postgres-statefulset.yaml index f8b3c3a..15d49eb 100644 --- a/phoenix/postgres-statefulset.yaml +++ b/phoenix/postgres-statefulset.yaml @@ -33,6 +33,9 @@ spec: labels: app: postgres spec: + # Prevent scheduling on Raspberry Pi due to resource requirements (256Mi-1Gi memory, 250m-1000m CPU) + nodeSelector: + hardware: high-memory containers: - name: postgres image: postgres:16 diff --git a/raspberry-pi-tolerations-examples.yaml b/raspberry-pi-tolerations-examples.yaml new file mode 100644 index 0000000..d58c978 --- /dev/null +++ b/raspberry-pi-tolerations-examples.yaml @@ -0,0 +1,79 @@ +# Examples of tolerations for services that SHOULD run on Raspberry Pi +# These services have low resource requirements and can benefit from Pi-specific features + +# 1. Pi-hole - Perfect for Raspberry Pi (DNS filtering, network service) +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: pihole +spec: + template: + spec: + # Allow scheduling on Raspberry Pi + tolerations: + - key: "node-type" + operator: "Equal" + value: "raspberry-pi" + effect: "NoSchedule" + # Prefer Raspberry Pi for network services + affinity: + nodeAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 100 + preference: + matchExpressions: + - key: node-type + operator: In + values: ["raspberry-pi"] + +# 2. Home Assistant - May benefit from running on Pi for local device access +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: home-assistant + namespace: home-assistant +spec: + template: + spec: + # Allow scheduling on Raspberry Pi (good for IoT hub role) + tolerations: + - key: "node-type" + operator: "Equal" + value: "raspberry-pi" + effect: "NoSchedule" + # Prefer Raspberry Pi for home automation + affinity: + nodeAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 80 + preference: + matchExpressions: + - key: node-type + operator: In + values: ["raspberry-pi"] + +# 3. Lightweight services (Fava, Vaultwarden, Glance) +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: lightweight-service-example +spec: + template: + spec: + # Allow scheduling on Raspberry Pi for lightweight workloads + tolerations: + - key: "node-type" + operator: "Equal" + value: "raspberry-pi" + effect: "NoSchedule" + # No preference - let scheduler decide based on resource availability + resources: + requests: + memory: "64Mi" + cpu: "50m" + limits: + memory: "512Mi" + cpu: "500m" \ No newline at end of file diff --git a/validate-scheduling.sh b/validate-scheduling.sh new file mode 100755 index 0000000..92f0945 --- /dev/null +++ b/validate-scheduling.sh @@ -0,0 +1,118 @@ +#!/bin/bash + +# Raspberry Pi K3s Scheduling Validation Script +# Run this to check your cluster configuration and pod distribution + +echo "=== Kubernetes Node Analysis ===" +echo + +echo "1. Node Overview:" +kubectl get nodes -o wide +echo + +echo "2. Node Resource Capacity:" +kubectl describe nodes | grep -A 5 "Allocatable:" +echo + +echo "3. Node Labels and Taints:" +kubectl get nodes --show-labels +echo +kubectl describe nodes | grep -E "(Name:|Taints:)" | grep -A 1 "Name:" +echo + +echo "=== Pod Distribution Analysis ===" +echo + +echo "4. High-Resource Pods Location:" +echo "Checking where memory-intensive applications are scheduled..." +echo + +echo "n8n PostgreSQL pods:" +kubectl get pods -n n8n -o wide | grep postgres || echo "No n8n postgres pods found" +echo + +echo "Minecraft server pods:" +kubectl get pods -n minecraft -o wide || echo "No minecraft pods found" +echo + +echo "OpenWebUI pods:" +kubectl get pods -o wide | grep open-webui || echo "No OpenWebUI pods found" +echo + +echo "Phoenix pods:" +kubectl get pods -n phoenix -o wide || echo "No Phoenix pods found" +echo + +echo "Jellyfin pods:" +kubectl get pods -n jellyfin -o wide || echo "No Jellyfin pods found" +echo + +echo "Prometheus pods:" +kubectl get pods -n monitoring -o wide | grep prometheus || echo "No Prometheus pods found" +echo + +echo "=== Resource Usage ===" +echo + +echo "5. Current Node Resource Usage:" +kubectl top nodes 2>/dev/null || echo "Metrics server not available - install with: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml" +echo + +echo "6. Top Memory-Consuming Pods:" +kubectl top pods --all-namespaces --sort-by=memory 2>/dev/null | head -10 || echo "Metrics server not available" +echo + +echo "=== Pod Events (Recent Issues) ===" +echo + +echo "7. Recent Pod Scheduling Events:" +kubectl get events --all-namespaces --sort-by='.lastTimestamp' | grep -E "(Failed|Error|Warning)" | tail -10 +echo + +echo "=== Validation Summary ===" +echo + +# Count pods per node +echo "8. Pod Distribution Per Node:" +echo "Node Pod Count" +echo "------------------------|---------" +kubectl get pods --all-namespaces -o wide --no-headers | awk '{print $8}' | sort | uniq -c | awk '{printf "%-24s| %s\n", $2, $1}' +echo + +echo "=== Recommendations ===" +echo + +# Check if any high-resource pods are on wrong nodes +echo "9. Checking for Potential Issues:" + +# Get Raspberry Pi node name (assumes it has 'pi' in the name or is ARM64) +RPI_NODE=$(kubectl get nodes -o jsonpath='{.items[?(@.status.nodeInfo.architecture=="arm64")].metadata.name}' | head -1) + +if [ -n "$RPI_NODE" ]; then + echo "Detected Raspberry Pi node: $RPI_NODE" + + # Check if high-resource pods are on RPi + HIGH_MEM_PODS=$(kubectl get pods --all-namespaces -o wide | grep "$RPI_NODE" | grep -E "(postgres|minecraft|phoenix|jellyfin|prometheus|openwebui)") + + if [ -n "$HIGH_MEM_PODS" ]; then + echo "⚠️ WARNING: High-resource pods found on Raspberry Pi node:" + echo "$HIGH_MEM_PODS" + echo + echo "These pods should be moved to more powerful nodes." + else + echo "✅ Good: No high-resource pods detected on Raspberry Pi node." + fi +else + echo "ℹ️ Could not auto-detect Raspberry Pi node. Please check manually." +fi + +echo +echo "=== Next Steps ===" +echo +echo "If you see high-resource pods on your Raspberry Pi node:" +echo "1. Apply the node labels: kubectl label nodes hardware=high-memory" +echo "2. Apply the taint: kubectl taint nodes node-type=raspberry-pi:NoSchedule" +echo "3. Apply updated manifests with nodeSelectors" +echo "4. Delete problematic pods to force rescheduling" +echo +echo "See RASPBERRY_PI_SCHEDULING_FIX.md for detailed instructions." \ No newline at end of file