feat(deploy): add Helm chart for Kubernetes deployment#256
Conversation
Packages spurctld (HA StatefulSet), spurd (DaemonSet), spurrestd, spurdbd, and the operator into a single chart at deploy/helm/spur/. SpurJob CRD ships under crds/ so Helm installs it on first install and leaves it alone on upgrade/uninstall (avoids wiping user CRs). Values cover the cases that previously required forking the raw examples/k8s/ manifests: replica counts, image overrides per component, controller PVC, ROCm device exposure, external Postgres vs dev-mode sidecar, REST API ingress, and free-form extra TOML appended to spur.conf. CI workflow runs helm lint + helm template across four scenarios (defaults / single-node dev / external DB / ingress) and validates the rendered output with kubeconform --strict against k8s 1.29. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
The post-install NOTES.txt warnings now include concrete `helm upgrade --reuse-values` commands for each fix path: - embedded postgres: add a PVC, switch to external DB, or disable accounting - controller emptyDir: enable persistence with a PVC Cuts the round-trip from "warning printed" to "next command typed" for users hitting these on first install. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a first-party Helm chart (deploy/helm/spur/) to deploy Spur’s Kubernetes stack (controller, agents, REST API, accounting, operator) plus a GitHub Actions workflow to lint/template/validate the rendered manifests.
Changes:
- Introduces a new Helm chart with configurable components, generated
spur.conf(including Raft peer list), and install notes. - Adds the
SpurJobCRD undercrds/for Helm-managed initial install semantics. - Adds CI workflow to run
helm lint, render multiple value scenarios, and validate output withkubeconform.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| deploy/helm/spur/Chart.yaml | New chart metadata (versioning, kubeVersion, links). |
| deploy/helm/spur/values.yaml | New values contract for all components and config generation. |
| deploy/helm/spur/README.md | Chart usage/operations documentation and examples. |
| deploy/helm/spur/.helmignore | Packaging ignore rules. |
| deploy/helm/spur/crds/spurjob-crd.yaml | Installs the SpurJob CRD via Helm crds/. |
| deploy/helm/spur/templates/_helpers.tpl | Helper templates for naming/labels/images and Raft peer generation. |
| deploy/helm/spur/templates/namespace.yaml | Optional Namespace resource rendering. |
| deploy/helm/spur/templates/serviceaccount.yaml | Shared ServiceAccount for workloads. |
| deploy/helm/spur/templates/rbac.yaml | Operator ClusterRole/ClusterRoleBinding. |
| deploy/helm/spur/templates/configmap.yaml | Generates spur.conf ConfigMap from values (including Raft peers). |
| deploy/helm/spur/templates/controller.yaml | Controller headless Service + StatefulSet + optional PDB. |
| deploy/helm/spur/templates/agent.yaml | Agent DaemonSet with optional ROCm device mounts. |
| deploy/helm/spur/templates/restd.yaml | REST API Service + Deployment + optional Ingress. |
| deploy/helm/spur/templates/accounting.yaml | Accounting Service + Deployment with external DB or embedded Postgres sidecar. |
| deploy/helm/spur/templates/operator.yaml | Operator Service + Deployment with health endpoints. |
| deploy/helm/spur/templates/NOTES.txt | Post-install guidance and dev-default warnings. |
| .github/workflows/helm.yml | Adds Helm lint/template + kubeconform validation workflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # SpurJob CRD lifecycle. The CRD itself lives under crds/ so Helm installs it | ||
| # on first install. Set install=false if you manage CRDs externally (e.g. ArgoCD | ||
| # pre-sync hook). Helm never removes CRDs on uninstall — that's intentional to | ||
| # avoid wiping user CRs. | ||
| crds: | ||
| install: true | ||
|
|
| operator: | ||
| enabled: false | ||
| crds: | ||
| install: false |
| kubectl apply -f deploy/helm/spur/crds/spurjob-crd.yaml | ||
| ``` | ||
|
|
||
| If you manage CRDs externally (e.g. ArgoCD pre-sync hook), set `crds.install=false` and ship `crds/spurjob-crd.yaml` yourself. |
| kubectl -n {{ .Release.Namespace }} get pods | ||
| kubectl -n {{ .Release.Namespace }} rollout status statefulset/spurctld | ||
|
|
||
| Submit a job via the operator (requires the CRD, installed by this chart): |
| {{- with .Values.commonAnnotations }} | ||
| annotations: | ||
| {{- toYaml . | nindent 4 }} | ||
| {{- end }} |
| {{- if $useExternal }} | ||
| {{- if .Values.accounting.externalDatabase.url }} | ||
| - --database-url=$(SPUR_DB_URL) | ||
| {{- else }} | ||
| - --database-url=$(SPUR_DB_URL) | ||
| {{- end }} |
| metadata: | ||
| name: spurctld | ||
| namespace: {{ .Release.Namespace }} |
| # Common labels applied to every object the chart creates. | ||
| commonLabels: {} | ||
| commonAnnotations: {} |
| --set agent.gpu.rocm=false \ | ||
| --set agent.nodeSelector=null \ | ||
| --set operator.enabled=false \ | ||
| --set crds.install=false \ | ||
| > /tmp/render-single.yaml |
| - name: Install kubeconform | ||
| run: | | ||
| curl -sSL -o /tmp/kubeconform.tar.gz \ | ||
| https://github.com/yannh/kubeconform/releases/download/v0.6.7/kubeconform-linux-amd64.tar.gz | ||
| tar -xzf /tmp/kubeconform.tar.gz -C /usr/local/bin kubeconform | ||
|
|
||
| # Validate every rendered scenario against real Kubernetes API schemas. | ||
| # --strict catches unknown fields; --ignore-missing-schemas lets the | ||
| # SpurJob CRD instances through without us hosting a custom schema. | ||
| - name: kubeconform validate (all scenarios) |
shiv-tyagi
left a comment
There was a problem hiding this comment.
Thanks @yansun1996.
We can do another round of review once you address the copilot comments.
| with always-on Raft, GPU-first scheduling, and a Kubernetes operator | ||
| that reconciles SpurJob CRs into pods. | ||
| type: application | ||
| version: 0.1.0 |
There was a problem hiding this comment.
| version: 0.1.0 | |
| version: 0.0.1-dev |
Let's keep the chart version as -dev till we are ready to release our first chart.
| # changing values is safe; Helm will diff and reconcile. | ||
|
|
||
| # -- Cluster name written into spur.conf | ||
| clusterName: spur-k8s |
There was a problem hiding this comment.
Do we need it here when there is a section for config itself?
| repository: ghcr.io/rocm/spur | ||
| tag: "" # defaults to .Chart.AppVersion when empty | ||
| pullPolicy: IfNotPresent | ||
| pullSecrets: [] |
There was a problem hiding this comment.
We are not releasing images to public yet. We need to think about what to put here.
| that reconciles SpurJob CRs into pods. | ||
| type: application | ||
| version: 0.1.0 | ||
| appVersion: "0.1.0" |
There was a problem hiding this comment.
| appVersion: "0.1.0" | |
| appVersion: "0.3.0" |
appVersion does not need to match the chart version.
There can be instances when we release the app but the chart doesn't need changing and vice-versa.
|
convert to draft, wait for the CI image to be hosted |
Closes #255
Summary
deploy/helm/spur/packaging spurctld (HA StatefulSet), spurd (DaemonSet), spurrestd, spurdbd, and the operator into a singlehelm install.SpurJobCRD lives undercrds/so Helm installs it on first install and leaves it alone on upgrade/uninstall (avoids wiping user CRs)..github/workflows/helm.yml) runshelm lint+helm templateacross four scenarios (defaults / single-node dev / external DB / ingress) and validates the rendered output withkubeconform --strictagainst k8s 1.29.What the chart covers
spurctldspur.confis generated fromcontroller.replicaCountspurd/dev/kfd+/dev/drimounts toggleable viaagent.gpu.rocmspurrestdspurdbdspur-k8s-operatorSpurJobCRsSpurJobCRDcrds/spurjob-crd.yamlTemplates mirror
examples/k8s/1:1 so the raw manifests remain a useful customization reference.config.extraTomlescape hatch lets users append arbitraryspur.conffields without forking the chart. Config changes roll the controller viachecksum/configannotation. Post-installNOTES.txtprints the connect commands plus actionablehelm upgrade --reuse-valuesremediation snippets when dev-mode defaults are in use (embedded postgres without PVC, controlleremptyDir).Test plan
helm lint deploy/helm/spurpasseshelm templaterenders cleanly for: defaults (3-replica Raft HA), single-node dev (1 replica, no PVC, no GPU, no operator), external Postgres via secret, REST API ingressreplicaCount=1→ 1 peer,replicaCount=3→ 3 peers inspur.confghcr.io/rocm/spurimage not being published yet (today the defaultimage.repositoryis a placeholder; users override with--set image.repository=...)Out of scope (follow-ups)
docs/deployment/kubernetes.rstto add a Helm section alongside the existing raw-manifest walkthroughhelm package+ push to OCI as part ofrelease.yml🤖 Generated with Claude Code