Case Studies

Projects & outcomes

Real work. Real challenges. The details that matter — not marketing copy.

KubernetesCloud MigrationVideo Infrastructure

High-Scale Video Platform Migration to Kubernetes

Designed and executed a complete infrastructure migration for a high-scale video platform — moving from a legacy on-prem environment to a production Kubernetes cluster on the cloud. Built the entire video processing pipeline (ingest, transcode, storage, CDN delivery) as cloud-native workloads.

~40%
Infrastructure cost reduction
10× faster
Deployment time
Zero
Downtime during migration
< 90s
Autoscaling response time

The Challenge

The client was running a high-traffic video processing platform on aging on-prem hardware. The system was brittle, hard to scale during traffic spikes, and required manual intervention for deployments. Processing queues would back up under load, and there was no reliable failover. They needed a path to cloud-native infrastructure without disrupting live video delivery.

What I Did

  • Assessed existing infrastructure and mapped all workloads, dependencies, and data flows
  • Designed target architecture on cloud Kubernetes (EKS) with autoscaling worker pools
  • Built full IaC with Terraform: VPC, node groups, storage, networking
  • Containerized all services and built Helm charts for each workload
  • Implemented ArgoCD for GitOps-based deployments across environments
  • Built video processing pipeline with autoscaling job workers (ffmpeg-based)
  • Set up CDN integration for video delivery and origin failover
  • Executed zero-downtime cutover with DNS-based traffic shifting

Stack & Tools

AWS EKSTerraformArgoCDHelmKEDAffmpegS3 + CloudFrontGitHub Actions
GPU InfrastructureKubernetesAI / ML

NVIDIA A100 GPU Integration on Kubernetes with MIG Partitioning

Designed and deployed a production multi-tenant GPU cluster on Kubernetes using NVIDIA A100s with full MIG (Multi-Instance GPU) partitioning — matching MIG profile sizes to model sizes so every GPU cycle counts. Small models get small slices; large models get the full card.

+65%
GPU utilization improvement
Up to 7×
Models served per A100
−55%
Inference cost per request
Full MIG
Tenant isolation

The Challenge

The client was building a multi-tenant AI inference platform and needed to serve dozens of models simultaneously — from lightweight 7B models to large 70B+ models — on a fixed pool of NVIDIA A100 80GB GPUs. Giving each model a full GPU was wasteful and expensive. Running everything on shared GPUs without isolation caused memory conflicts and unstable latency. They needed fine-grained, isolated GPU partitioning with Kubernetes-native scheduling.

What I Did

  • Deployed NVIDIA GPU Operator on Kubernetes to manage drivers, container runtime, and device plugins automatically
  • Enabled MIG mode on all A100 nodes and planned profile allocation based on model size tiers
  • Configured 1g.10gb MIG instances for small models (≤7B params) — up to 7 instances per GPU
  • Configured 2g.20gb MIG instances for mid-size models (7B–13B params)
  • Configured 4g.40gb MIG instances for large models (30B–40B params)
  • Reserved full 7g.80gb instances for 70B+ models needing the entire card
  • Applied custom Kubernetes node labels per MIG profile for precise pod scheduling
  • Built a dynamic MIG reconfiguration pipeline using mig-parted to reshape profiles on demand without node reboots
  • Set up resource quotas and LimitRanges per namespace to enforce fair GPU allocation across teams
  • Integrated vLLM inference server as the serving layer, pinned to specific MIG instances via device plugin
  • Built Prometheus + Grafana dashboards for per-MIG GPU utilization, memory, and inference throughput

Stack & Tools

NVIDIA A100 80GBNVIDIA GPU OperatorMIG / mig-partedKubernetesvLLMKEDAPrometheusGrafanaHelmTerraform
AutomationBare-MetalKubernetesAnsible

Zero-Touch Bare-Metal Provisioning: Rack to K8s Node in Under 2 Hours

Designed and built a fully automated provisioning pipeline for HPE ProLiant DL servers. From the moment a server is connected to the Cisco Nexus network, the pipeline takes over — running hardware diagnostics, applying server-specific BIOS and iLO settings via the Redfish API, installing Ubuntu, and joining the node to a production Kubernetes cluster. No manual steps. No SSH sessions. Just rack, cable, wait.

< 2 hrs
Provisioning time per server
~95%
Reduction in manual steps
Zero
Config drift incidents
Full
Re-provisioning support

The Challenge

The client was expanding their on-premises Kubernetes cluster with batches of new HPE ProLiant servers. Each server required manual BIOS configuration, OS installation, and node onboarding — a process taking 6–8 hours per server, prone to configuration drift and human error. With 20–100 servers to provision in rolling waves, the team needed a repeatable, auditable pipeline that could scale without adding headcount.

What I Did

  • Designed network-triggered provisioning flow: DHCP/PXE boot detected via Cisco Nexus switch events kicks off the pipeline automatically
  • Integrated HP Redfish API (iLO) to run pre-provisioning hardware diagnostics — memory, storage, NIC validation — and halt on failure before any OS install
  • Built Ansible playbooks to apply server-profile-specific BIOS settings (power profiles, boot order, hyperthreading, SR-IOV) based on server model detected from Redfish inventory
  • Used NetBox as the source of truth for IP allocation, rack position, server role, and cluster assignment — all pulled dynamically at provisioning time
  • Set up PXE + cloud-init for unattended Ubuntu Server installation with role-specific partitioning schemes per server type
  • Triggered GitLab CI / GitHub Actions pipelines from NetBox webhooks to drive the full provisioning sequence as code
  • Automated kubeadm-based Kubernetes node join using cluster join tokens generated and stored securely per provisioning run
  • Built idempotent re-provisioning support: re-racking or replacing a server re-runs the full pipeline cleanly from scratch
  • Implemented Slack + pipeline notifications at each stage (hardware pass/fail, OS install, K8s join) for full observability without SSH access

Stack & Tools

HPE ProLiant DLHP Redfish API (iLO)Cisco NexusAnsibleNetBoxGitLab CI / GitHub ActionsPXE + cloud-initUbuntu ServerKubernetes (kubeadm)

Working on something similar?

Let's talk. Book a free discovery call and we'll figure out if I'm the right fit for your project.

Book a Call