Senior Ai Infrastructure & Platform Engineer Riyadh,ksa

???????, C, EG, Egypt

Job Description

Role Overview:



We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client's team in Riyadh. In this role, you'll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.



Key Responsibilities:



Deploy, maintain, and optimize GPU-based compute clusters and infrastructure. Manage and operate GPU orchestration tools and platforms such as: Nvidia Base Command Manager (critical) Nvidia AI Enterprise Suite Nvidia GPU and Network Operators Nvidia NIMs and Blueprints Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including: Slurm (critical) Vanilla Kubernetes Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software. Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads. Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows. Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management. Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies.

Requirements:



Required Skills & Experience:



Proven experience managing GPU-based AI/ML infrastructure and compute clusters. Hands-on experience with: Nvidia Base Command Manager Nvidia AI Enterprise Suite Nvidia GPU/Network Operators, NIMs, Blueprints Strong experience with Slurm and/or Kubernetes orchestration. Solid Linux system administration skills -- preferably on Ubuntu or similar distributions. Strong scripting/automation ability (e.g. Bash, Python, or relevant tooling) for provisioning, deployment, and maintenance. Excellent troubleshooting and performance-tuning skills. Experience collaborating with ML/data science teams and integrating infrastructure with their workflows. Strong understanding of networking, security, resource allocation, and cluster management best practices.

Preferred Qualifications:



Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team. Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments. Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups. * Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.

Beware of fraud agents! do not pay money to get a job

MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD2179403
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    ???????, C, EG, Egypt
  • Education
    Not mentioned