Skip to content
View pmady's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report pmady

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pmady/README.md

Typing SVG

profile views  


Senior Cloud Platform Engineer building GPU/AI infrastructure at scale.
CNCF Golden Kubestronaut. Oracle ACE Associate. Dragonfly Community Member.
31+ PRs across 17 open-source projects in CNCF, ASWF, and beyond.
If GPUs need scheduling, scaling, or observability on Kubernetes — that's what I build.


⚡ What I'm Building

🎮 GPU Autoscaling KEDA External Scaler with native NVML metrics, DaemonSet architecture, scaling profiles for vLLM, Triton, and training workloads. Referenced in KEDA #7538 and published on CNCF Blog.
🔬 GPU NUMA Topology Volcano scheduler plugin for NUMA-aware GPU placement — topology discovery via sysfs, CRD extensions, and cross-socket affinity optimization.
📡 GPU Observability OpenTelemetry Collector receiver for GPU metrics (NVML-native) and Docker Desktop Extension for real-time GPU monitoring dashboards.
🧠 Topology-Aware AIOps Knowledge graph of Kubernetes resources with graph-based root-cause traversal, AlertManager webhook integration, and blast-radius analysis.
☁️ Platform Engineering Kubernetes, ArgoCD, Crossplane, Docker, KEDA — production platforms serving enterprise workloads at scale.
📝 Technical Writing 19 published articles across CNCF Blog, IEEE ComSoc, Platform Engineering, VKTR, Cloud Native Now, and Medium.

🏆 Certifications & Recognition

Golden Kubestronaut — All five Kubernetes certifications: KCNA, CKA, CKAD, CKS, KCSA


🚀 Featured Projects

Stars CI License

KEDA External gRPC Scaler for GPU/AI workloads

  • 🎮 Native NVML — Direct GPU metrics via go-nvml
  • 🚀 Scaling Profiles — vLLM, Triton, training presets
  • 📦 DaemonSet — Per-node GPU metric collection
  • 🔄 Scale-to-Zero — GPU-aware idle detection
  • 📈 Prometheus — Optional /metrics endpoint

Tech: Go · gRPC · NVIDIA NVML · Kubernetes · Helm

Referenced in KEDA #7538 | CNCF Blog

Stars License

OpenTelemetry Collector receiver for GPU metrics

  • 🔋 NVIDIA NVML — GPU utilization, memory, temperature
  • 📊 OTel Native — Standard OTLP export pipeline
  • 🖥️ Multi-GPU — All devices on the node
  • 📈 Prometheus — Built-in Prometheus exporter

Tech: Go · OpenTelemetry Collector SDK · NVML

Stars License

Real-time NVIDIA GPU metrics in Docker Desktop

  • 📊 Live Dashboard — Utilization, memory, temperature, power
  • 📈 History Charts — 2-minute rolling Recharts graphs
  • 🚦 Alert Thresholds — Color-coded green/yellow/red
  • 🎭 Mock Mode — Develop without GPU hardware

Tech: Go · React · Recharts · Docker Extension SDK · NVML

Stars License

K8s knowledge graph & automated root-cause analysis

  • 🗺️ Knowledge Graph — Real-time resource topology
  • 🔍 Root-Cause Traversal — Graph-based incident investigation
  • 🎮 GPU Aware — Training/inference/batch classification
  • 🔔 AlertManager — Webhook integration for auto-investigation

Tech: Go · Kubernetes API · Gorilla Mux · Helm

More projects: KubeAI Autoscaler · Ingress2Gateway · Golden Kubestronaut Learning · LLMOps


🌱 Open Source Contributions

31+ PRs across 17 projects in CNCF, ASWF, and open-source foundations.

CNCF (Cloud Native Computing Foundation)

Project Description Contributions
Dragonfly P2P-based file distribution and image acceleration client#1665 - Add Hugging Face backend support with hf:// protocol, client#1673 - Add ModelScope backend support with modelscope:// protocol, d7y.io#386 - Add hf:// protocol documentation, d7y.io#398 - Add P2P-accelerated AI model downloads blog post, helm-charts#455 - Add injector support to helm chart, helm-charts#480 - Replace deprecated bitnamilegacy/mysql with bitnami/mysql
Kubernetes Production-Grade Container Orchestration #53891 - Document deployment.kubernetes.io/* annotations, #53892 - Add kubectl apply view-last-applied documentation
TiKV Distributed transactional key-value database #19225 - Add AGENTS.md for AI agent guidance
Volcano Cloud-native batch scheduling for AI/HPC #5095 - GPU NUMA topology awareness in scheduler, apis#229 - Add GPUInfo type to NumatopoSpec CRD, resource-exporter#12 - GPU NUMA topology discovery via sysfs
HAMi Heterogeneous AI Computing Virtualization Middleware #1893 - Add unit tests for nvinternal info, mig, and watch packages
KEDA Kubernetes Event-driven Autoscaling keda-docs#1658 - Removing metricName from the kedadocs, keda-docs#1769 - Fix datadog scaler typos across all versions, #7538 - GPU/AI inference scaler architectural analysis
Metal³ Bare metal host provisioning for Kubernetes #624 - Fix redirect links in tryit.md
OpenTelemetry Observability framework #8632 - Add .NET troubleshooting page
kpt Kubernetes-native packaging and resource management #4278 - Fix kpt fn doc command for KRM functions expecting input
traceAI Open-source LLM observability SDK #165 - Fix exporter shutdown and thread safety in Python SDK, #166 - Add Go SDK with OpenAI instrumentor

ASWF (Academy Software Foundation)

Project Description Contributions
OpenColorIO Color management library #2229 - Add release signing workflow, #2230 - Add Dependabot configuration, #2243 - Add Vulkan unit test framework
OpenCue Cloud rendering management system #2134 - Add scheduled subscription recalculation task
OpenImageIO Image processing library #4976 - Fix IBA::compare_Yee() channel access
RAWtoACES RAW to ACES image conversion #222 - Add build developer documentation
xSTUDIO Playback and review application #186 - Fix broken build guide links

🧰 Tech Stack


📝 Publications

19 articles published across CNCF Blog, IEEE ComSoc, Platform Engineering, VKTR, Cloud Native Now, and Medium.

Title Publication Date
GPU Autoscaling on Kubernetes with KEDA: Building an External Scaler CNCF Blog May 2026
Shattering the Kubernetes Registry Bottleneck: Scaling Enterprise CI/CD with P2P Mesh Architecture Cloud Native Now May 2026
The Inference Bottleneck: Architecting Kubernetes Autoscaling for Production LLMs Cloud Native Now May 2026
Agentic AIOps: Building the Guardrails for Autonomous Infrastructure VKTR May 2026
Architecting Enterprise GitOps: Scaling Argo CD on OKE Cloud Native Now May 2026
Deploying Docker AI Agents on OCI and OKE Cloud Native Now May 2026
Abstracting AI Infrastructure: Native GPU Scaling for Internal Developer Platforms Platform Engineering May 2026
Why Enterprise AI Fails: The 4 Infrastructure Bottlenecks Nobody Wants to Talk About VKTR Apr 2026
From public static void main to Golden Kubestronaut: The Art of Unlearning CNCF Blog Apr 2026
Peer-to-Peer Acceleration for AI Model Distribution with Dragonfly CNCF Blog Apr 2026
The IDP Paradox: Why Your Internal Developer Platform Needs a "Java-First" Strategy Platform Engineering Apr 2026
The Financial Trap of Autonomous Networks: Scaling Agentic AI in the Telecom Core IEEE ComSoc Mar 2026
Zero-Trust on OKE: How to Actually Secure Your Clusters With Terraform Cloud Native Now Mar 2026
Beyond the Green Checkmark: Using Formal Verification to Stop ArgoCD Drift Cloud Native Now Mar 2026
The Efficiency Era: How Kubernetes v1.35 Finally Solves the "Restart" Headache Cloud Native Now Mar 2026
Beyond Basic Sync: Why ArgoCD v3 is the Backbone of Modern Platform Engineering Platform Engineering Feb 2026
From PagerDuty to 'Agentic Ops': The Rise of Self-Healing Kubernetes Cloud Native Now Feb 2026
I Replaced a $3/hr GPU Dev Workflow with Docker Model Runner Medium May 2026
GPU-Aware Autoscaling for Docker Containers Medium May 2026

📊 GitHub Stats

GitHub Stats

Stats updated on 2026-05-31 13:04 UTC

🐍 Contribution Activity


🤝 Let's Connect

Building GPU infrastructure for Kubernetes? Working on CNCF projects? Let's collaborate.

Pinned Loading

  1. llmops llmops Public

    🚀 The Ultimate Curated List of LLMOps Tools, Frameworks, and Resources - A comprehensive collection of the best tools for Large Language Model Operations

    Shell 10 4

  2. pmady pmady Public

    11 1

  3. golden-kubestronaut-learning golden-kubestronaut-learning Public

    A comprehensive learning resource for achieving Kubestronaut and Golden Kubestronaut status through CNCF certifications

    Markdown 17 7

  4. kubeai-autoscaler kubeai-autoscaler Public

    Go 12 7

  5. ingress2gateway ingress2gateway Public

    Convert Kubernetes Ingress objects to Gateway API resources - Web GUI and REST API

    Python 8 3

  6. keda-gpu-scaler keda-gpu-scaler Public

    KEDA External gRPC Scaler for GPU workloads — native NVML metrics via DaemonSet, no Prometheus required

    Go 44 15