Kubernetes Orchestration Engineer – Gpu Hypercomputing & Ai Workloads

Dubai, DU, AE, United Arab Emirates

Job Description

Introduction
We are seeking a highly skilled Kubernetes Orchestration Engineer to lead the deployment and management of GPU-optimized Kubernetes environments that power AI/ML and hypercomputing workloads. This role is critical to ensuring scalable, reliable, and high-performance infrastructure across on-premises and hybrid cloud environments. As a core member of our infrastructure engineering team, you will work at the intersection of container orchestration, GPU resource management, and AI application scaling, enabling large scale distributed training and inference across GPU clusters


Must Have

Strong experience with Kubernetes (K8s) and container orchestration in production environments. Expertise in managing GPU workloads in Kubernetes using NVIDIA GPU Operator, vGPU, and device plugin configurations. Proficiency with container runtimes such as Docker and CRI-O, and orchestration tools like Helm and Kubernetes Operators. Solid understanding of networking within Kubernetes and service mesh integration (e.g., Istio, Linkerd). Familiarity with hybrid/multi-cloud Kubernetes platforms (e.g., GKE, EKS, AKS). Strong scripting and automation skills (e.g., YAML, Helm templating, Bash, Python).
Responsibilities include:

AI Infrastructure Design & Deployment with multi-GPU clusters using NVIDIA or AMD platforms. Configure GPU environments using CUDA, DGX Systems, and NVIDIA Kubernetes Device Plugin. Deploy and manage containerized environments with Docker, Kubernetes, and Slurm. AI Model Support & Optimization for training, fine-tuning, and inference pipelines for LLMs and deep learning models. Enable distributed training using DDP, FSDP, and ZeRO, with support for mixed precision. Tune infrastructure to optimize model performance, throughput, and GPU utilization. Design and operate high-bandwidth, low-latency networks using InfiniBand and RoCE v2. Integrate GPUDirect Storage and optimize data flow across Lustre, BeeGFS, and Ceph/S3. Support fast data ingestion, ETL pipelines, and large-scale data staging. Leverage NVIDIA's AI stack including cuDNN, NCCL, TensorRT, and Triton Inference Server. Conduct performance benchmarking with MLPerf and custom test suites
Certifications :

Certified Kubernetes Administrator (CKA) -Must Certified Kubernetes Application Developer (CKAD) NVIDIA Certified Kubernetes Specialist


Educational QualificationsBatchlors in Computer Science/Applications/BTech Computer Science/MCA
Primary Skills :

Kubernetes Cluster Management for AI/ML Workloads NVIDIA GPU Operator & Device Plugin Configuration in K8s Container Orchestration using Docker, CRI-O, and Helm Kubernetes Operators for Lifecycle Automation & Scaling Pod Networking with CNI Plugins - Calico, Flannel, Cilium Monitoring & Observability with Prometheus, Grafana, Kibana GPU Workload Scheduling & Optimization in Kubernetes Deployment of Distributed AI Frameworks (PyTorch, TensorFlow, Hugging Face) Service Mesh Integration - Istio or Linkerd Hybrid/Multi-Cloud Kubernetes Deployments (EKS, GKE, AKS)
Secondary Skills :

Helm Templating & YAML Scripting for Deployment Automation Infrastructure Scripting using Bash, Python, or Ansible Kubernetes Custom Resource Definitions (CRDs) & API Extensions GPU Virtualization (vGPU) and Multi-Tenant GPU Allocation Kubeflow or MLflow Integration for MLOps Pipelines K8s Security (RBAC, Network Policies, Pod Security Standards) CI/CD Integration with GitOps Tools (ArgoCD, Flux) GPU Monitoring via NVIDIA DCGM or NVIDIA Cloud Native Stack Advanced Troubleshooting in Kubernetes (control plane, etcd, kubelet) Cloud-Native Storage for AI - CSI Drivers, NFS, Ceph
Job Details
Role:


Kubernetes Orchestration Engineer - GPU Hypercomputing & AI Workloads


Location :


Dubai


Close Date :


18-07-2025


Interested candidates may forward their detailed resumes to

Careers@reflectionsinfos.com

along with their notice period, current and expected CTC details. This is to notify jobseekers that some fraudsters are promising jobs with Reflections Info Systems for a fee. Please note that no payment is ever sought for jobs in Reflections. We contact our candidates only through our official website or LinkedIn and all employment related mails are sent through the official HR email id. Please contact careers@reflectionsinfos.com for any clarification/ alerts on this subject.

Beware of fraud agents! do not pay money to get a job

MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD1928531
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Dubai, DU, AE, United Arab Emirates
  • Education
    Not mentioned