Sre Site Reliability Engineer Opening For Abu Dhabi Happiestminds

Abu Dhabi, United Arab Emirates

Job Description

Warm Greetings from Happiestminds Technologies,
Please find below JD and kindly go-through company profile link in Signature and do apply if you are interested
Site Reliability Engineer (SRE)
From designing fault-tolerant architectures to leading incident responses, youll have the freedom to shape how we deliver stable, secure, and high-performance banking services.
About the Role
Were looking for a talented Site Reliability Engineer (SRE) to keep our systems running smoothly, reliably, and at scale. Through smart automation, deep observability, and a calm head in a crisis, youll help us balance speed, compliance, and stability, working alongside DevOps, Cloud, Quality Engineering, and Product teams to drive continuous improvements in performance, security, and resilience.
Youll play a key role in enhancing reliability, accelerating delivery, and ensuring seamless digital experiences for ADCB customers.
This role reports directly to our Lead SRE / Tribe Executive Manager.
What You Will Be Doing

  • Define and implement SLIs / SLOs and error budgets for business-critical digital banking services.
  • Build actionable observability (metrics, logs, traces, dashboards, and alerts) using Dynatrace, Prometheus, Grafana, and ELK, while reducing alert fatigue.
  • Leverage AI-driven insights and anomaly detection (Dynatrace Davis AI or equivalent AIOps platform) to proactively predict and resolve reliability issues before impact.
  • Lead incident management from on-call triage and root-cause analysis to blameless postmortems with actionable follow-ups.
  • Improve deployment safety with robust rollout / rollback strategies, canary and blue-green deployments, and production readiness reviews.
  • Support and optimize microservices-based architectures, ensuring service reliability, scalability, and inter-service resilience.
  • Conduct capacity planning, performance tuning, and resilience testing, optimizing for both reliability and cost efficiency.
  • Automate operational toil -- from runbooks and remediation scripts to proactive health checks and self-healing workflows.
  • Collaborate with DevOps to embed reliability gates and validations into CI / CD pipelines (GitHub Actions, Jenkins, GitLab CI / CD or Azure DevOps).
  • Own and evolve the observability and AIOps stack, driving intelligent automation and predictive alerting capabilities.
  • Maintain high-quality documentation, playbooks, and operational standards across environments.
  • Ensure operational compliance and security alignment with internal controls and regulatory standards.
  • Analyze system performance, availability, and cost data to continually optimize operations.
  • Provide reliability support and escalation guidance for critical production systems during major incidents.

Skills Required

Beware of fraud agents! do not pay money to get a job

MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD2165995
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Abu Dhabi, United Arab Emirates
  • Education
    Not mentioned