Skip to content

feat(deploy): add Helm chart for Kubernetes deployment#256

Draft
yansun1996 wants to merge 2 commits into
ROCm:mainfrom
yansun1996:helm-chart
Draft

feat(deploy): add Helm chart for Kubernetes deployment#256
yansun1996 wants to merge 2 commits into
ROCm:mainfrom
yansun1996:helm-chart

Conversation

@yansun1996

Copy link
Copy Markdown
Member

Closes #255

Summary

  • New Helm chart at deploy/helm/spur/ packaging spurctld (HA StatefulSet), spurd (DaemonSet), spurrestd, spurdbd, and the operator into a single helm install.
  • SpurJob CRD lives under crds/ so Helm installs it on first install and leaves it alone on upgrade/uninstall (avoids wiping user CRs).
  • CI workflow (.github/workflows/helm.yml) runs helm lint + helm template across four scenarios (defaults / single-node dev / external DB / ingress) and validates the rendered output with kubeconform --strict against k8s 1.29.

What the chart covers

Component Kind Notes
spurctld StatefulSet + headless Service + PDB Raft peer list in spur.conf is generated from controller.replicaCount
spurd DaemonSet ROCm /dev/kfd + /dev/dri mounts toggleable via agent.gpu.rocm
spurrestd Deployment + Service + optional Ingress
spurdbd Deployment External Postgres via secret or dev-mode embedded sidecar
spur-k8s-operator Deployment + ClusterRole/Binding Reconciles SpurJob CRs
SpurJob CRD crds/spurjob-crd.yaml Helm never removes (by design)

Templates mirror examples/k8s/ 1:1 so the raw manifests remain a useful customization reference. config.extraToml escape hatch lets users append arbitrary spur.conf fields without forking the chart. Config changes roll the controller via checksum/config annotation. Post-install NOTES.txt prints the connect commands plus actionable helm upgrade --reuse-values remediation snippets when dev-mode defaults are in use (embedded postgres without PVC, controller emptyDir).

Test plan

  • helm lint deploy/helm/spur passes
  • helm template renders cleanly for: defaults (3-replica Raft HA), single-node dev (1 replica, no PVC, no GPU, no operator), external Postgres via secret, REST API ingress
  • CI job runs all four scenarios + kubeconform-validates against k8s 1.29
  • Verified Raft peer list generation: replicaCount=1 → 1 peer, replicaCount=3 → 3 peers in spur.conf
  • End-to-end install against a real cluster — blocked on ghcr.io/rocm/spur image not being published yet (today the default image.repository is a placeholder; users override with --set image.repository=...)

Out of scope (follow-ups)

  • Publishing the chart to a public registry (separate workflow change once the image is published)
  • Updating docs/deployment/kubernetes.rst to add a Helm section alongside the existing raw-manifest walkthrough
  • helm package + push to OCI as part of release.yml

🤖 Generated with Claude Code

yansun1996 and others added 2 commits June 5, 2026 22:34
Packages spurctld (HA StatefulSet), spurd (DaemonSet), spurrestd,
spurdbd, and the operator into a single chart at deploy/helm/spur/.
SpurJob CRD ships under crds/ so Helm installs it on first install
and leaves it alone on upgrade/uninstall (avoids wiping user CRs).

Values cover the cases that previously required forking the raw
examples/k8s/ manifests: replica counts, image overrides per
component, controller PVC, ROCm device exposure, external Postgres
vs dev-mode sidecar, REST API ingress, and free-form extra TOML
appended to spur.conf.

CI workflow runs helm lint + helm template across four scenarios
(defaults / single-node dev / external DB / ingress) and validates
the rendered output with kubeconform --strict against k8s 1.29.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
The post-install NOTES.txt warnings now include concrete
`helm upgrade --reuse-values` commands for each fix path:
- embedded postgres: add a PVC, switch to external DB, or disable accounting
- controller emptyDir: enable persistence with a PVC

Cuts the round-trip from "warning printed" to "next command typed"
for users hitting these on first install.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 5, 2026 22:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a first-party Helm chart (deploy/helm/spur/) to deploy Spur’s Kubernetes stack (controller, agents, REST API, accounting, operator) plus a GitHub Actions workflow to lint/template/validate the rendered manifests.

Changes:

  • Introduces a new Helm chart with configurable components, generated spur.conf (including Raft peer list), and install notes.
  • Adds the SpurJob CRD under crds/ for Helm-managed initial install semantics.
  • Adds CI workflow to run helm lint, render multiple value scenarios, and validate output with kubeconform.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
deploy/helm/spur/Chart.yaml New chart metadata (versioning, kubeVersion, links).
deploy/helm/spur/values.yaml New values contract for all components and config generation.
deploy/helm/spur/README.md Chart usage/operations documentation and examples.
deploy/helm/spur/.helmignore Packaging ignore rules.
deploy/helm/spur/crds/spurjob-crd.yaml Installs the SpurJob CRD via Helm crds/.
deploy/helm/spur/templates/_helpers.tpl Helper templates for naming/labels/images and Raft peer generation.
deploy/helm/spur/templates/namespace.yaml Optional Namespace resource rendering.
deploy/helm/spur/templates/serviceaccount.yaml Shared ServiceAccount for workloads.
deploy/helm/spur/templates/rbac.yaml Operator ClusterRole/ClusterRoleBinding.
deploy/helm/spur/templates/configmap.yaml Generates spur.conf ConfigMap from values (including Raft peers).
deploy/helm/spur/templates/controller.yaml Controller headless Service + StatefulSet + optional PDB.
deploy/helm/spur/templates/agent.yaml Agent DaemonSet with optional ROCm device mounts.
deploy/helm/spur/templates/restd.yaml REST API Service + Deployment + optional Ingress.
deploy/helm/spur/templates/accounting.yaml Accounting Service + Deployment with external DB or embedded Postgres sidecar.
deploy/helm/spur/templates/operator.yaml Operator Service + Deployment with health endpoints.
deploy/helm/spur/templates/NOTES.txt Post-install guidance and dev-default warnings.
.github/workflows/helm.yml Adds Helm lint/template + kubeconform validation workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +38 to +44
# SpurJob CRD lifecycle. The CRD itself lives under crds/ so Helm installs it
# on first install. Set install=false if you manage CRDs externally (e.g. ArgoCD
# pre-sync hook). Helm never removes CRDs on uninstall — that's intentional to
# avoid wiping user CRs.
crds:
install: true

Comment on lines +64 to +67
operator:
enabled: false
crds:
install: false
kubectl apply -f deploy/helm/spur/crds/spurjob-crd.yaml
```

If you manage CRDs externally (e.g. ArgoCD pre-sync hook), set `crds.install=false` and ship `crds/spurjob-crd.yaml` yourself.
kubectl -n {{ .Release.Namespace }} get pods
kubectl -n {{ .Release.Namespace }} rollout status statefulset/spurctld

Submit a job via the operator (requires the CRD, installed by this chart):
Comment on lines +8 to +11
{{- with .Values.commonAnnotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
Comment on lines +61 to +66
{{- if $useExternal }}
{{- if .Values.accounting.externalDatabase.url }}
- --database-url=$(SPUR_DB_URL)
{{- else }}
- --database-url=$(SPUR_DB_URL)
{{- end }}
Comment on lines +4 to +6
metadata:
name: spurctld
namespace: {{ .Release.Namespace }}
Comment on lines +18 to +20
# Common labels applied to every object the chart creates.
commonLabels: {}
commonAnnotations: {}
Comment on lines +42 to +46
--set agent.gpu.rocm=false \
--set agent.nodeSelector=null \
--set operator.enabled=false \
--set crds.install=false \
> /tmp/render-single.yaml
Comment on lines +65 to +74
- name: Install kubeconform
run: |
curl -sSL -o /tmp/kubeconform.tar.gz \
https://github.com/yannh/kubeconform/releases/download/v0.6.7/kubeconform-linux-amd64.tar.gz
tar -xzf /tmp/kubeconform.tar.gz -C /usr/local/bin kubeconform

# Validate every rendered scenario against real Kubernetes API schemas.
# --strict catches unknown fields; --ignore-missing-schemas lets the
# SpurJob CRD instances through without us hosting a custom schema.
- name: kubeconform validate (all scenarios)

@shiv-tyagi shiv-tyagi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yansun1996.

We can do another round of review once you address the copilot comments.

with always-on Raft, GPU-first scheduling, and a Kubernetes operator
that reconciles SpurJob CRs into pods.
type: application
version: 0.1.0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version: 0.1.0
version: 0.0.1-dev

Let's keep the chart version as -dev till we are ready to release our first chart.

# changing values is safe; Helm will diff and reconcile.

# -- Cluster name written into spur.conf
clusterName: spur-k8s

@shiv-tyagi shiv-tyagi Jun 6, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need it here when there is a section for config itself?

Comment on lines +13 to +16
repository: ghcr.io/rocm/spur
tag: "" # defaults to .Chart.AppVersion when empty
pullPolicy: IfNotPresent
pullSecrets: []

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not releasing images to public yet. We need to think about what to put here.

that reconciles SpurJob CRs into pods.
type: application
version: 0.1.0
appVersion: "0.1.0"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
appVersion: "0.1.0"
appVersion: "0.3.0"

appVersion does not need to match the chart version.

There can be instances when we release the app but the chart doesn't need changing and vice-versa.

@yansun1996 yansun1996 marked this pull request as draft June 10, 2026 02:23
@yansun1996

Copy link
Copy Markdown
Member Author

convert to draft, wait for the CI image to be hosted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Package Spur as a Helm chart for K8s distribution

3 participants