368 lines
17 KiB
Markdown
368 lines
17 KiB
Markdown
|
|
# Platform Engineer Agent — Deployment Plan
|
|||
|
|
|
|||
|
|
An autonomous **Hermes Agent** that runs inside the k3s cluster, watches its
|
|||
|
|
health on a schedule, tries to fix simple problems, and notifies me (via
|
|||
|
|
Discord) when something needs my attention or a fix failed.
|
|||
|
|
|
|||
|
|
Docs: https://hermes-agent.nousresearch.com/docs/user-guide/docker
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Goal & operating model
|
|||
|
|
|
|||
|
|
- **One Hermes container** in a new namespace `platform-engineer`, scheduled on
|
|||
|
|
the powerful amd64 node (`roger-nucbox-evo-x2`, 24 GiB RAM).
|
|||
|
|
- Hermes runs in **gateway mode** under s6 supervision (`command: gateway run`),
|
|||
|
|
so the built-in **cron scheduler** is active and survives restarts.
|
|||
|
|
- The agent talks to the cluster with `kubectl` from *inside* the container
|
|||
|
|
(terminal backend = `local`). We give the pod a **ServiceAccount + ClusterRole**
|
|||
|
|
scoped to read-mostly + restart/scale/delete-pod permissions.
|
|||
|
|
- LLM calls are routed through the in-cluster **LiteLLM** proxy
|
|||
|
|
(`litellm.rogi.casa`) — no external API keys needed in the cluster.
|
|||
|
|
- Notifications go to **Discord** (reuse the pattern from `myorg-assistant`).
|
|||
|
|
- A set of **cron jobs** (Hermes-native, not Kubernetes CronJobs) make the agent
|
|||
|
|
run periodic checks. Watchdog checks use `[SILENT]` so it only pings me when
|
|||
|
|
something is wrong.
|
|||
|
|
|
|||
|
|
Why Hermes-native cron (not k8s CronJobs):
|
|||
|
|
- Hermes cron ticks inside the gateway, runs in an isolated agent session,
|
|||
|
|
supports `[SILENT]` suppression, `deliver="discord"`, `workdir`, and
|
|||
|
|
`context_from` chaining — far less plumbing than spawning a fresh pod per run.
|
|||
|
|
- Cron jobs live in `~/.hermes/cron/jobs.json` on the PVC, so they survive pod
|
|||
|
|
restarts and can be edited live via `hermes cron edit` without redeploying.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Files to create (this directory)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
platform-engineer/
|
|||
|
|
├── namespace.yaml # namespace platform-engineer
|
|||
|
|
├── rbac.yaml # ServiceAccount + ClusterRole (+binding)
|
|||
|
|
├── configmap.yaml # hermes config.yaml + SOUL.md + cron seed script
|
|||
|
|
├── secret.yaml # DISCORD bot token, LITELLM_API_KEY, kubeconfig-less SA token
|
|||
|
|
├── pvc.yaml # persistent /opt/data (HERMES_HOME)
|
|||
|
|
├── dockerfile # derived image: hermes-agent + kubectl + helm
|
|||
|
|
├── deployment.yaml # Deployment, schedules on amd64, mounts kube SA token
|
|||
|
|
├── ingress.yaml # hermes.rogi.casa → dashboard (optional)
|
|||
|
|
└── README.md # this file
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Then add a line to `argocd/gen-apps.sh` `APPS=(...)`:
|
|||
|
|
```
|
|||
|
|
"platform-engineer|platform-engineer|platform-engineer|true|true"
|
|||
|
|
```
|
|||
|
|
and re-run `./argocd/gen-apps.sh` to generate `argocd/apps/platform-engineer.yaml`
|
|||
|
|
so ArgoCD reconciles it like every other app in the repo.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. RBAC — least privilege
|
|||
|
|
|
|||
|
|
ServiceAccount `platform-engineer` in ns `platform-engineer`, bound to a
|
|||
|
|
**ClusterRole** scoped to *platform engineer* actions:
|
|||
|
|
|
|||
|
|
**Read (get/list/watch):** nodes, pods, services, deployments, statefulsets,
|
|||
|
|
daemonsets, replicasets, jobs, cronjobs, events, configmaps, secrets, PVCs,
|
|||
|
|
ingresses, namespaces.
|
|||
|
|
|
|||
|
|
**Act (patch/update on a allowlist):**
|
|||
|
|
- `pods` → `delete` (force-restart a stuck pod), `patch` (`/evict`, annotations)
|
|||
|
|
- `deployments`, `statefulsets`, `daemonsets`, `replicasets` → `patch` (restart
|
|||
|
|
via `kubectl rollout restart` / scale), `update`
|
|||
|
|
- `jobs`, `cronjobs` → `delete`, `patch`
|
|||
|
|
- `pods/exec` (subresource) → `create` (only if we want the agent to `kubectl
|
|||
|
|
exec` into pods for log-style debugging — optional; keep off initially)
|
|||
|
|
- `events` → `get/list/watch` only
|
|||
|
|
|
|||
|
|
**No cluster-scoped writes** (no creating namespaces, no node taints, no RBAC
|
|||
|
|
edits, no CRDs). The agent can *propose* those and tell me; it cannot do them
|
|||
|
|
itself. All mutating calls are auditable via Kubernetes audit logs and
|
|||
|
|
`kubectl auth can-i --as=system:serviceaccount:platform-engineer:platform-engineer`.
|
|||
|
|
|
|||
|
|
The pod uses the k3s in-cluster ServiceAccount token (`/var/run/secrets/...
|
|||
|
|
/serviceaccount/token`) + the `KUBERNETES_SERVICE_HOST/PORT` env vars k3s already
|
|||
|
|
injects — **no kubeconfig file, no long-lived token on disk**.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Image — thin derived Dockerfile
|
|||
|
|
|
|||
|
|
```dockerfile
|
|||
|
|
FROM nousresearch/hermes-agent:latest
|
|||
|
|
USER root
|
|||
|
|
RUN apt-get update \
|
|||
|
|
&& apt-get install -y --no-install-recommends curl gnupg \
|
|||
|
|
&& curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.35/deb/Release.key \
|
|||
|
|
| gpg --dearmor -o /usr/share/keyrings/kubernetes-apt-keyring.gpg \
|
|||
|
|
&& echo 'deb [signed-by=/usr/share/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.35/deb/ /' \
|
|||
|
|
> /etc/apt/sources.list.d/kubernetes.list \
|
|||
|
|
&& apt-get update \
|
|||
|
|
&& apt-get install -y --no-install-recommends kubectl \
|
|||
|
|
&& curl -fsSL https://get.helm.sh/helm-v3.16.0-linux-amd64.tar.gz \
|
|||
|
|
| tar -xz -C /usr/local/bin --strip-components=1 linux-amd64/helm \
|
|||
|
|
&& rm -rf /var/lib/apt/lists/*
|
|||
|
|
USER hermes
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
> Note: the cluster is mixed arch (arm64/amd64/arm). The agent pod is pinned to
|
|||
|
|
> the amd64 node, so `linux-amd64` helm + `kubectl` packages are fine. If you
|
|||
|
|
> later want it portable, switch to a multi-arch build with
|
|||
|
|
> `TARGETARCH` and install matching helm arch.
|
|||
|
|
|
|||
|
|
Build & push to your Gitea registry (`git.rogi.casa/roger/...`) — same
|
|||
|
|
`imagePullSecrets: gitea-registry` pattern as `gym-tracker`. Tag with the
|
|||
|
|
hermes version + a short git sha.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Hermes configuration (mounted via ConfigMap → /opt/data/config.yaml)
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
# config.yaml (seeded into the PVC on first boot)
|
|||
|
|
model:
|
|||
|
|
provider: openai-api
|
|||
|
|
default: claude-4.5-haiku
|
|||
|
|
base_url: "https://litellm.rogi.casa/v1"
|
|||
|
|
api_mode: chat_completions
|
|||
|
|
|
|||
|
|
# Use a cheap, fast model for auxiliary tasks (titling, compression)
|
|||
|
|
auxiliary:
|
|||
|
|
compression:
|
|||
|
|
provider: openai-api
|
|||
|
|
model: gemini-3-flash
|
|||
|
|
title_generation:
|
|||
|
|
provider: openai-api
|
|||
|
|
model: gemini-3-flash
|
|||
|
|
|
|||
|
|
terminal:
|
|||
|
|
backend: local
|
|||
|
|
cwd: /workspace # a working dir for any kubectl output / scratch
|
|||
|
|
timeout: 180
|
|||
|
|
home_mode: profile # isolate tool credentials under HERMES_HOME/home
|
|||
|
|
|
|||
|
|
# Unattended gateway → circuit-breaker on tool-call loops
|
|||
|
|
tool_loop_guardrails:
|
|||
|
|
hard_stop_enabled: true
|
|||
|
|
hard_stop_after:
|
|||
|
|
exact_failure: 5
|
|||
|
|
idempotent_no_progress: 5
|
|||
|
|
|
|||
|
|
sessions:
|
|||
|
|
auto_prune: true
|
|||
|
|
retention_days: 90
|
|||
|
|
|
|||
|
|
cron:
|
|||
|
|
wrap_response: false # cleaner Discord messages
|
|||
|
|
|
|||
|
|
memory:
|
|||
|
|
memory_enabled: true
|
|||
|
|
user_profile_enabled: true
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
`.env` (from Secret, mounted to `/opt/data/.env`):
|
|||
|
|
```
|
|||
|
|
OPENAI_API_KEY=<LITELLM_API_KEY value, i.e. sk-...>
|
|||
|
|
OPENAI_BASE_URL=https://litellm.rogi.casa/v1
|
|||
|
|
DISCORD_BOT_TOKEN=<new dedicated bot token>
|
|||
|
|
DISCORD_HOME_CHANNEL=<your user/channel id for alerts>
|
|||
|
|
# Dashboard auth (homelab, trusted LAN behind ingress)
|
|||
|
|
HERMES_DASHBOARD_BASIC_AUTH_USERNAME=roger
|
|||
|
|
HERMES_DASHBOARD_BASIC_AUTH_PASSWORD=<strong password>
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
> Why `OPENAI_API_KEY` + `OPENAI_BASE_URL`: the `openai-api` provider honours
|
|||
|
|
> `OPENAI_BASE_URL`, so this is the simplest way to point Hermes at the
|
|||
|
|
> in-cluster LiteLLM. `claude-4.5-haiku` / `gemini-3-flash` are the model names
|
|||
|
|
> already exposed by your `litellm/litellm.yaml` ConfigMap.
|
|||
|
|
|
|||
|
|
`SOUL.md` (personality + guardrails) — see `configmap.yaml`. Key points:
|
|||
|
|
- Identity: "Platform Engineer for the rogi.casa k3s cluster."
|
|||
|
|
- Knows the cluster layout (3 nodes, ArgoCD GitOps, Traefik+cert-manager,
|
|||
|
|
LiteLLM, services list).
|
|||
|
|
- Operating rules: read-first; only act on the allowlisted verbs; never edit
|
|||
|
|
RBAC / taints / namespaces / CRDs; when in doubt, notify instead of acting;
|
|||
|
|
always cite the resource and the command used.
|
|||
|
|
- How to reach me: `deliver="discord"`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Deployment
|
|||
|
|
|
|||
|
|
- `replicas: 1` (Hermes data dir is single-writer — never scale >1).
|
|||
|
|
- `nodeSelector: kubernetes.io/arch: amd64` + preferred `hardware: high-memory`
|
|||
|
|
affinity → lands on the NUC.
|
|||
|
|
- `resources`: requests 512Mi/250m, limits 2Gi/1 core (Hermes recommends
|
|||
|
|
2–4 GiB; 1 GiB is fine without browser tools, which we keep off).
|
|||
|
|
- Volume: PVC mounted at `/opt/data` (HERMES_HOME), RWX not needed (single pod).
|
|||
|
|
- Ports: 8642 (gateway API, internal only) and 9119 (dashboard) → exposed via
|
|||
|
|
Ingress `hermes.rogi.casa` with TLS + basic-auth (already enforced by the
|
|||
|
|
`HERMES_DASHBOARD_BASIC_AUTH_*` env vars).
|
|||
|
|
- `imagePullSecrets: gitea-registry`.
|
|||
|
|
- env from Secret; `HERMES_DASHBOARD=1`.
|
|||
|
|
- Init: on first boot the s6 `01-hermes-setup` hook seeds config/SOUL/.env from
|
|||
|
|
the ConfigMap if the volume is empty. We mount the ConfigMap as a readonly
|
|||
|
|
projection at `/opt/seed/` and run a tiny initContainer to copy it into
|
|||
|
|
`/opt/data` only when `/opt/data/config.yaml` doesn't exist (so ArgoCD
|
|||
|
|
self-heal never fights the agent's live-edited config).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Cron jobs to seed (Hermes-native)
|
|||
|
|
|
|||
|
|
These are written by an init script (one-shot Job `hermes-cron-seed`) that runs
|
|||
|
|
`hermes cron create ...` against the gateway on first install, and is idempotent
|
|||
|
|
(it checks existing job names). All deliver to Discord. Examples:
|
|||
|
|
|
|||
|
|
| Name | Schedule | Prompt (abbreviated) |
|
|||
|
|
|------|----------|------------------------|
|
|||
|
|
| `cluster-health-check` | `every 15m` | Run `kubectl get nodes,pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded` and `kubectl get events -A --field-selector type=Warning --since=20m`. If everything healthy, reply with only `[SILENT]`. Otherwise summarize failures and root-cause briefly. |
|
|||
|
|
| `pod-restart-loop` | `every 10m` | Find pods in `CrashLoopBackOff`/`ImagePullBackOff` across all namespaces. For `CrashLoopBackOff`, fetch logs and if a clear transient cause (OOM, config parse, missing secret) is visible, attempt `kubectl rollout restart <deploy>`; otherwise notify me with the log excerpt. Reply `[SILENT]` if none found. |
|
|||
|
|
| `pvc-pressure` | `every 30m` | `kubectl get pv` + node disk via `kubectl top nodes`. Alert if any PVC `Bound` to a near-full volume or node disk >85%. `[SILENT]` otherwise. |
|
|||
|
|
| `argocd-sync-health` | `every 1h` | `kubectl get applications -n argocd -o wide` (or `argocd app sync --dry-run` if CLI present). Report any `OutOfSync`/`Degraded` app. `[SILENT]` if all `Synced`+`Healthy`. |
|
|||
|
|
| `cert-expiry` | `every 1d at 09:00` | List cert-manager `Certificate` resources with expiry < 21 days. Notify only if any. `[SILENT]` otherwise. |
|
|||
|
|
| `node-resource-drift` | `every 30m` | `kubectl top nodes`. Alert if any node CPU>90% or mem>90% sustained, or any node `NotReady`. `[SILENT]` otherwise. |
|
|||
|
|
| `daily-cluster-report` | `0 8 * * *` | Summarize: node count/status, top 5 pods by CPU/mem, # pods not Running, # ArgoCD apps OutOfSync, cert warnings. Always deliver (no `[SILENT]`). |
|
|||
|
|
|
|||
|
|
Design rules baked into SOUL.md:
|
|||
|
|
- **Read-only checks** run frequently (10–30m) and stay silent unless wrong.
|
|||
|
|
- **Mutating actions** are restricted to safe idempotent ones (rollout restart,
|
|||
|
|
delete stuck pod so controller recreates). Anything riskier → notify me with
|
|||
|
|
a proposed command and wait for me to run it (I can reply in Discord to the
|
|||
|
|
continuable thread).
|
|||
|
|
- Cron sessions are isolated and **cannot create new cron jobs** (Hermes
|
|||
|
|
disables that inside cron runs) → no runaway loops.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Safety & guardrails
|
|||
|
|
|
|||
|
|
1. **RBAC is the real boundary.** Even if the agent goes rogue, the SA can't
|
|||
|
|
touch other namespaces' secrets beyond read, can't change RBAC, can't taint
|
|||
|
|
nodes, can't create namespaces.
|
|||
|
|
2. **`tool_loop_guardrails.hard_stop_enabled: true`** — circuit-breaks a stuck
|
|||
|
|
gateway (recommended in the Docker doc for unattended deployments).
|
|||
|
|
3. **`skills.write_approval: false` but `memory.write_approval: true`** (so the
|
|||
|
|
agent can build skills/memories but I review memory writes lazily — flip
|
|||
|
|
this if it gets noisy).
|
|||
|
|
4. **No `pods/exec` subresource** initially (keep the agent from shelling into
|
|||
|
|
workloads). Enable later only if you want log-grep-style debugging.
|
|||
|
|
5. **Dashboard behind ingress TLS + basic auth** (the June-2026 hardening makes
|
|||
|
|
auth mandatory on non-loopback binds; we satisfy it with the bundled
|
|||
|
|
basic-auth provider).
|
|||
|
|
6. **Single replica / single-writer PVC** — the Docker doc is explicit that two
|
|||
|
|
gateways on the same `/opt/data` corrupt session/memory stores. Use a
|
|||
|
|
`podAntiAffinity` so an accidental scale-up doesn't co-run.
|
|||
|
|
7. **ArgoCD interaction:** keep `syncPolicy.automated.prune+selfHeal` but
|
|||
|
|
exclude the live-edited hermes state. Practically: Argo owns the *manifests*
|
|||
|
|
(deployment, configmap, secret, pvc), while `/opt/data` (config.yaml,
|
|||
|
|
cron/jobs.json, SOUL.md edits made via the dashboard) is runtime state on the
|
|||
|
|
PVC and is *not* reconciled by Argo. The ConfigMap only *seeds* it on first
|
|||
|
|
boot. Document this clearly in the README so future-you doesn't expect Argo
|
|||
|
|
to reset the agent's personality.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Rollout plan
|
|||
|
|
|
|||
|
|
1. Build & push the derived image to `git.rogi.casa/roger/hermes-agent` (tag
|
|||
|
|
`v1.35-<sha>`).
|
|||
|
|
2. Create the namespace + RBAC + Secret + ConfigMap + PVC:
|
|||
|
|
`kubectl apply -f platform-engineer/`.
|
|||
|
|
3. Create the `platform-engineer` Discord bot, invite it, put its token + your
|
|||
|
|
channel id in `secret.yaml` (base64).
|
|||
|
|
4. Apply the Deployment; wait for the pod to go Running.
|
|||
|
|
5. `kubectl exec` in and run the one-shot cron seed:
|
|||
|
|
`hermes cron create ...` (or apply the `cron-seed` Job).
|
|||
|
|
6. Trigger the first `cluster-health-check` manually: `hermes cron run cluster-health-check`.
|
|||
|
|
7. Add the app to `argocd/gen-apps.sh`, regenerate, commit, push.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. Decisions (locked in)
|
|||
|
|
|
|||
|
|
1. **Notifications:** dedicated `platform-engineer` Discord bot → its own token
|
|||
|
|
in `secret.yaml` (`DISCORD_BOT_TOKEN`, `DISCORD_HOME_CHANNEL`).
|
|||
|
|
2. **Dashboard:** public at `hermes.rogi.casa` (Traefik TLS + cert-manager + the
|
|||
|
|
bundled Hermes basic-auth provider). Reach the dashboard on port 9119; the
|
|||
|
|
gateway API on 8642 is ClusterIP-only.
|
|||
|
|
3. **Image:** derived image pushed to `git.rogi.casa/roger/hermes-agent`, pulled
|
|||
|
|
via the existing `gitea-registry` imagePullSecret (must also exist in the
|
|||
|
|
`platform-engineer` ns — see deploy steps).
|
|||
|
|
4. **Model:** `qwen-3.6:27b` via the in-cluster Ollama box (`10.88.20.12:11434`),
|
|||
|
|
exposed through LiteLLM as `qwen-3.6:27b`. Added to `litellm/litellm.yaml`.
|
|||
|
|
Hermes reaches LiteLLM at `https://litellm.rogi.casa/v1` (never Ollama directly).
|
|||
|
|
5. **pods/exec:** granted (`pods/exec` → `create` in the ClusterRole) so the
|
|||
|
|
agent can `kubectl exec`/`kubectl logs` for debugging.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 11. Deployment checklist (do in this order)
|
|||
|
|
|
|||
|
|
1. **Add the Ollama model to LiteLLM** (already done in `litellm/litellm.yaml`):
|
|||
|
|
the `qwen-3.6:27b` entry points at `http://10.88.20.12:11434`. Make sure
|
|||
|
|
`qwen3.6:27b` is actually pulled on that Ollama host
|
|||
|
|
(`ollama pull qwen3.6:27b`). Apply: `kubectl apply -f litellm/` and restart
|
|||
|
|
the LiteLLM pod so the new config takes effect.
|
|||
|
|
2. **Create the `gitea-registry` secret in the new namespace** (ArgoCD won't
|
|||
|
|
create it — it's not in the repo):
|
|||
|
|
```
|
|||
|
|
kubectl create namespace platform-engineer
|
|||
|
|
kubectl create secret docker-registry gitea-registry \
|
|||
|
|
--docker-server=git.rogi.casa \
|
|||
|
|
--docker-username=<your-gitea-user> \
|
|||
|
|
--docker-password=<gitea-access-token> \
|
|||
|
|
--docker-email=<your-email> \
|
|||
|
|
-n platform-engineer
|
|||
|
|
```
|
|||
|
|
3. **Build & push the image:** `./platform-engineer/build-and-push.sh`
|
|||
|
|
(after `docker login git.rogi.casa`).
|
|||
|
|
4. **Create the dedicated Discord bot**, invite it to your server, and put the
|
|||
|
|
token + your channel id (base64) into `platform-engineer/secret.yaml`. Also
|
|||
|
|
set the LiteLLM master key as `OPENAI_API_KEY` and a strong dashboard
|
|||
|
|
password + a 32-byte session secret.
|
|||
|
|
5. **Commit & push** the whole change. ArgoCD will create the namespace
|
|||
|
|
resources, deploy the pod, and bring up the ingress at `hermes.rogi.casa`.
|
|||
|
|
6. **Seed the cron jobs:**
|
|||
|
|
`kubectl apply -f platform-engineer/cron-seed.yaml` (one-shot Job) — it waits
|
|||
|
|
for the hermes pod, then runs `hermes cron create ...` for each watchdog.
|
|||
|
|
Re-run it any time you want to re-seed after a wipe.
|
|||
|
|
7. **Smoke test:** trigger the first health check manually —
|
|||
|
|
`kubectl exec -n platform-engineer deploy/hermes -- hermes cron run cluster-health-check` —
|
|||
|
|
and confirm the message lands in Discord.
|
|||
|
|
8. **ArgoCD:** the `Application` (`argocd/apps/platform-engineer.yaml`) is
|
|||
|
|
already generated. After commit, Argo will reconcile it like every other app.
|
|||
|
|
|
|||
|
|
## 12. What ArgoCD owns vs. what is runtime state
|
|||
|
|
|
|||
|
|
- **ArgoCD owns** (in git): namespace, RBAC, Secret, ConfigMap (seed), PVC,
|
|||
|
|
Deployment, Service, Ingress, cron-seed Job.
|
|||
|
|
- **Runtime state (on the PVC, NOT reconciled):** `config.yaml`, `SOUL.md`,
|
|||
|
|
`.env`, `cron/jobs.json`, `sessions/`, `memories/`, `skills/`. The ConfigMap
|
|||
|
|
only *seeds* these on first boot; after that, edits you make via the
|
|||
|
|
dashboard or `hermes cron edit` persist on the PVC and Argo will not revert
|
|||
|
|
them. If you ever want a hard reset, delete the PVC and re-apply.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Files in this directory
|
|||
|
|
|
|||
|
|
| File | Purpose |
|
|||
|
|
|------|---------|
|
|||
|
|
| `namespace.yaml` | namespace `platform-engineer` |
|
|||
|
|
| `rbac.yaml` | ServiceAccount + ClusterRole (+binding), least-privilege |
|
|||
|
|
| `configmap.yaml` | seed `config.yaml` + `SOUL.md` |
|
|||
|
|
| `secret.yaml` | Discord token, LiteLLM key, dashboard auth (PLACEHOLDERS — fill in) |
|
|||
|
|
| `pvc.yaml` | 5 Gi PVC for `/opt/data` |
|
|||
|
|
| `dockerfile` | derived image: hermes-agent + kubectl + helm (linux/amd64) |
|
|||
|
|
| `build-and-push.sh` | builds & pushes the image to the Gitea registry |
|
|||
|
|
| `deployment.yaml` | Deployment (1 replica, Recreate, pinned to amd64 NUC) + Service |
|
|||
|
|
| `ingress.yaml` | `hermes.rogi.casa` → dashboard (TLS + basic auth) |
|
|||
|
|
| `cron-seed.yaml` | one-shot Job that creates the Hermes cron schedule |
|
|||
|
|
|
|||
|
|
Also changed outside this directory:
|
|||
|
|
- `litellm/litellm.yaml` — added `qwen-3.6:27b` model entry.
|
|||
|
|
- `argocd/gen-apps.sh` + `argocd/apps/platform-engineer.yaml` — ArgoCD
|
|||
|
|
Application for this folder.
|
|||
|
|
```
|