Senior Cloud Platform Engineer building GPU/AI infrastructure at scale.
CNCF Golden Kubestronaut. Oracle ACE Associate. Dragonfly Community Member.
31+ PRs across 17 open-source projects in CNCF, ASWF, and beyond.
If GPUs need scheduling, scaling, or observability on Kubernetes — that's what I build.
| 🎮 GPU Autoscaling | KEDA External Scaler with native NVML metrics, DaemonSet architecture, scaling profiles for vLLM, Triton, and training workloads. Referenced in KEDA #7538 and published on CNCF Blog. |
| 🔬 GPU NUMA Topology | Volcano scheduler plugin for NUMA-aware GPU placement — topology discovery via sysfs, CRD extensions, and cross-socket affinity optimization. |
| 📡 GPU Observability | OpenTelemetry Collector receiver for GPU metrics (NVML-native) and Docker Desktop Extension for real-time GPU monitoring dashboards. |
| 🧠 Topology-Aware AIOps | Knowledge graph of Kubernetes resources with graph-based root-cause traversal, AlertManager webhook integration, and blast-radius analysis. |
| ☁️ Platform Engineering | Kubernetes, ArgoCD, Crossplane, Docker, KEDA — production platforms serving enterprise workloads at scale. |
| 📝 Technical Writing | 19 published articles across CNCF Blog, IEEE ComSoc, Platform Engineering, VKTR, Cloud Native Now, and Medium. |
Golden Kubestronaut — All five Kubernetes certifications: KCNA, CKA, CKAD, CKS, KCSA
|
KEDA External gRPC Scaler for GPU/AI workloads
Tech: Go · gRPC · NVIDIA NVML · Kubernetes · Helm Referenced in KEDA #7538 | CNCF Blog |
OpenTelemetry Collector receiver for GPU metrics
Tech: Go · OpenTelemetry Collector SDK · NVML |
|
Real-time NVIDIA GPU metrics in Docker Desktop
Tech: Go · React · Recharts · Docker Extension SDK · NVML |
K8s knowledge graph & automated root-cause analysis
Tech: Go · Kubernetes API · Gorilla Mux · Helm |
More projects: KubeAI Autoscaler · Ingress2Gateway · Golden Kubestronaut Learning · LLMOps
31+ PRs across 17 projects in CNCF, ASWF, and open-source foundations.
| Project | Description | Contributions |
|---|---|---|
| Dragonfly | P2P-based file distribution and image acceleration | client#1665 - Add Hugging Face backend support with hf:// protocol, client#1673 - Add ModelScope backend support with modelscope:// protocol, d7y.io#386 - Add hf:// protocol documentation, d7y.io#398 - Add P2P-accelerated AI model downloads blog post, helm-charts#455 - Add injector support to helm chart, helm-charts#480 - Replace deprecated bitnamilegacy/mysql with bitnami/mysql |
| Kubernetes | Production-Grade Container Orchestration | #53891 - Document deployment.kubernetes.io/* annotations, #53892 - Add kubectl apply view-last-applied documentation |
| TiKV | Distributed transactional key-value database | #19225 - Add AGENTS.md for AI agent guidance |
| Volcano | Cloud-native batch scheduling for AI/HPC | #5095 - GPU NUMA topology awareness in scheduler, apis#229 - Add GPUInfo type to NumatopoSpec CRD, resource-exporter#12 - GPU NUMA topology discovery via sysfs |
| HAMi | Heterogeneous AI Computing Virtualization Middleware | #1893 - Add unit tests for nvinternal info, mig, and watch packages |
| KEDA | Kubernetes Event-driven Autoscaling | keda-docs#1658 - Removing metricName from the kedadocs, keda-docs#1769 - Fix datadog scaler typos across all versions, #7538 - GPU/AI inference scaler architectural analysis |
| Metal³ | Bare metal host provisioning for Kubernetes | #624 - Fix redirect links in tryit.md |
| OpenTelemetry | Observability framework | #8632 - Add .NET troubleshooting page |
| kpt | Kubernetes-native packaging and resource management | #4278 - Fix kpt fn doc command for KRM functions expecting input |
| traceAI | Open-source LLM observability SDK | #165 - Fix exporter shutdown and thread safety in Python SDK, #166 - Add Go SDK with OpenAI instrumentor |
| Project | Description | Contributions |
|---|---|---|
| OpenColorIO | Color management library | #2229 - Add release signing workflow, #2230 - Add Dependabot configuration, #2243 - Add Vulkan unit test framework |
| OpenCue | Cloud rendering management system | #2134 - Add scheduled subscription recalculation task |
| OpenImageIO | Image processing library | #4976 - Fix IBA::compare_Yee() channel access |
| RAWtoACES | RAW to ACES image conversion | #222 - Add build developer documentation |
| xSTUDIO | Playback and review application | #186 - Fix broken build guide links |
19 articles published across CNCF Blog, IEEE ComSoc, Platform Engineering, VKTR, Cloud Native Now, and Medium.
Stats updated on 2026-05-31 13:04 UTC
Building GPU infrastructure for Kubernetes? Working on CNCF projects? Let's collaborate.





