Senior AI Infrastructure & Platform Engineer - Riyadh,KSA Job in DeepSource Technologies

Senior Ai Infrastructure & Platform Engineer Riyadh,ksa

???????, C, EG, Egypt

Apply Now

Job Description

Role Overview:

We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client's team in Riyadh. In this role, you'll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.

Key Responsibilities:

Deploy, maintain, and optimize GPU-based compute clusters and infrastructure. Manage and operate GPU orchestration tools and platforms such as: Nvidia Base Command Manager (critical) Nvidia AI Enterprise Suite Nvidia GPU and Network Operators Nvidia NIMs and Blueprints Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including: Slurm (critical) Vanilla Kubernetes Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software. Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads. Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows. Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management. Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies.

Requirements:

Required Skills & Experience:

Proven experience managing GPU-based AI/ML infrastructure and compute clusters. Hands-on experience with: Nvidia Base Command Manager Nvidia AI Enterprise Suite Nvidia GPU/Network Operators, NIMs, Blueprints Strong experience with Slurm and/or Kubernetes orchestration. Solid Linux system administration skills -- preferably on Ubuntu or similar distributions. Strong scripting/automation ability (e.g. Bash, Python, or relevant tooling) for provisioning, deployment, and maintenance. Excellent troubleshooting and performance-tuning skills. Experience collaborating with ML/data science teams and integrating infrastructure with their workflows. Strong understanding of networking, security, resource allocation, and cluster management best practices.

Preferred Qualifications:

Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team. Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments. Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups. * Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.

Beware of fraud agents! do not pay money to get a job

MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Related Jobs

D

Senior Infrastructure & Virtualization Engineer Riyadh,KSA

DeepSource Technologies

???????, C, EG

Apply Now
D

Senior AI Infrastructure & Platform Engineer Riyadh,KSA

DeepSource Technologies

???????, C, EG

Apply Now

D

Senior Infrastructure & Virtualization Engineer Riyadh,KSA

DeepSource Technologies

??????, S01, SA

Apply Now
O

Senior Cloud & Infrastructure Security Engineer

OutsorGroup

Mohammédia, 6, MA

Apply Now

Job Detail

Job Id

JD2179403
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Permanent
Job Location

???????, C, EG, Egypt
Education

Not mentioned

MNC Jobs Gulf

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers