As an AI Infrastructure & MLOps Engineer at Muller's Solutions for a 6-month contract, This role is
primarily operations-focused (90%)
, with hands-on involvement in
implementation, configuration, and setup
of AI infrastructure and MLOps workflows.
You will play a key role in managing, operating, and guiding the deployment of a
strategic AI environment
, working closely with the customer as a technical advisor and hands-on engineer.
What about the role responsibilities?
Operate and maintain AI infrastructure and MLOps platforms in a production environment.
Monitor, manage, and troubleshoot Kubernetes-based AI workloads.
Perform
Acceptance Testing Planning and Execution
for AI infrastructure and platforms.
Ensure stability, performance, and availability of AI systems.
Support day-to-day operational tasks across compute, storage, and networking layers.
Install and configure
NVIDIA Enterprise AI Stack (NVAI)
.
Configure and manage
MLOps platforms
such as
Kubeflow and MLflow
.
Assist in setting up
end-to-end AI workflows
, including data pipelines.
Support the initial implementation phase of the AI environment.
Act as a technical guide and advisor to the customer during the early stages of their AI adoption.
Requirements
What should you have to fit in this role?
Technical Requirements
AI / MLOps Stack
Proficient experience with the NVIDIA Enterprise AI Stack
Familiarity with Ubuntu Linux
Experience with Kubernetes
Knowledge of Kubeflow / MLflow
Experience with QFLOW (an open-source AI data pipeline management tool)
Programming & Automation
4-6 years of practical experience in:
+
Python
+
Jupyter Notebook / JupyterLab
Competence in writing, testing, and maintaining operational scripts and AI workflows.
Infrastructure Experience
Practical experience with enterprise infrastructure, encompassing:
Dell PowerScale
(5 nodes)
XE Server
(1 node)
Dell R570 Servers
(5 nodes)
Dell Network Switches
(2 switches)
GPU-based AI servers
(in a small-scale environment)
Environment Overview
Initial implementation of AI
Compact configuration:
+
1 GPU server
+
1 PowerScale
+
5 control plane servers
Opportunity to shape best practices from the ground up
To succeed in this role, it's nice to have:
Familiarity with data frameworks like Apache Spark or Hadoop for data processing.
Understanding of ML model monitoring and logging practices to ensure system reliability.
* Experience with security best practices in AI systems.
Beware of fraud agents! do not pay money to get a job
MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.