Site Reliability Engineer

Abu Dhabi, United Arab Emirates

bankfab

195 Current Jobs Openings

Apply Now

Job Description

Full-time

Division: GCOO

Sub Division: Group Technology

Company Description

Now it\xe2\x80\x99s your time to join the #1 bank in the Middle East and one of the most prestigious financial companies in the region. Shaking up the world of banking requires a lot of smarts and skill. We\xe2\x80\x99re looking for the brightest and best to help us reach our goals and we\xe2\x80\x99ll also help you reach yours. Your success is our success as you grow stronger in your career. Join us and leave a legacy of your own, as a pioneer in both the company and the industry

Job Purpose The Principal Engineer is responsible for creating scalable and highly reliable software systems through the leverage of tools and automation. The principal engineer will focus on SRE related roles and in improving performance and operational efficiency of the Business Applications of the bank. Key Accountabilities

Carry out Capacity Management best practices for Business Applications in scope.
Monitor and Report on the coverage of Business Applications in scope.
Automate and identify scope for improvement of Reliability and Availability of applications in scope by leveraging the banks tools and knowledge of scripting.
Aim to implement Chaos Engineering practices to be better prepared at recovery of business-critical services and drive down the MTTD and MTTR. Demonstrate through monthly/quarterly reports.
Identify and implement CSI initiatives with a focus on reducing technical debt and improving reliability/scalability/availability.
Active participation in incident/problem management calls, BPM and RC

Qualifications

Academics:

Bachelor\xe2\x80\x99s degree or equivalent.

Job knowledge, skills & experience:

10+ Years of demonstrable hands-on experience in improving the reliability of Critical Business Applications through SRE Best Practices.
Exceptional knowledge in systems monitoring, alerting and analytics (AppDynamics, Dynatrace, Splunk, etc.)
Experience in troubleshooting highly available, secure and reliable services with automatic failover using containers and container-orchestration tools like Kubernetes/OpenShift. While leveraging the monitoring solutions of the bank.
Extensive experience with Cloud Technologies Amazon Web Services and/or Azure.
Ability to define and report on the key KPIs to be tracked and improved using SRE best practices.
Experience in automating routine tasks \xe2\x80\x93 knowledge of Python, Bash, Ansible, Terraform
Experienced in working closely with Performance and Load test teams to define, track and analyse performance and availability targets for the Business Applications.
Ability to define comprehensive coverage requirements for monitored Business Applications and define the goals and outcomes to increase reliability and improve/maintain SLAs.
Demonstrates understanding of the Architecture of Business Applications with the ability to recommend improvements to improve reliability and uptime.
Experience using Chaos Engineering practices to build resiliency through the development lifecycle and Production.
Hands on knowledge of build automation and continuous integration/delivery ecosystem: Gitlab, Docker, Nexus, Selenium, Jenkins, Docker, Kubernetes.
Experience in working on a Linux based infrastructure
Critical thinker and problem-solving skills.

Must have knowledge

APM and log aggregation solution knowledge
Monitoring Tools Expertise minimum one or all tools like Splunk / ELK / AppDynamics / Dynatrace / NewRelica
Proficient in scripting - Python, Bash or Java
Experience working on Linux based infrastructure
ITIL Certified

Bonus knowledge

Experience in developing Continuous Integration/ Continuous Delivery pipelines (CI/ CD) \xe2\x80\x93 Gitlab /Azure Devops / Jenkins
Good hands-on knowledge of Configuration Management, Orchestration and Deployment tools like \xe2\x80\x93 Ansible, Terraform.
Cloud environment knowledge \xe2\x80\x93 Kubernetes, AWS EKS, Azure AKS
Working knowledge of various tools, open-source technologies, and cloud services

Behaviour Skills:

Independent, Self-Driven and able to bring ideas to the table
Ability to make decisions and drive changes.
Excellent Communication skills and able to communicate with senior stakeholders as well as with the technical teams.
Knowledgeable and a quick learner.
Fosters Innovation

Beware of fraud agents! do not pay money to get a job

MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Related Jobs

Site Reliability Engineer

Group 42

Abu Dhabi

Apply Now
DevOps / Site Reliability Engineer, Cloud Infrastructure

Acronis

Cairo

Apply Now

IoT Site Reliability Engineer (_VOIS)

Vodafone

Cairo

Apply Now
IoT Site Reliability Engineer (_VOIS)

Vodafone

Cairo

Apply Now

Job Detail

Job Id

JD1508208
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Permanent
Job Location

Abu Dhabi, United Arab Emirates
Education

Not mentioned

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers

Site Reliability Engineer

Job Description

Company Description

Qualifications

Related Jobs

Site Reliability Engineer

DevOps / Site Reliability Engineer, Cloud Infrastructure

IoT Site Reliability Engineer (_VOIS)

IoT Site Reliability Engineer (_VOIS)

Job Detail

Apply For This Job

Report this Employer

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers