Site Reliability Engineer

Abu Dhabi, United Arab Emirates

Job Description

b'

Full-time
Division: GCOO
Sub Division: Group Technology

Company Description

Now it\xe2\x80\x99s your time to join the #1 bank in the Middle East and one of the most prestigious financial companies in the region. Shaking up the world of banking requires a lot of smarts and skill. We\xe2\x80\x99re looking for the brightest and best to help us reach our goals and we\xe2\x80\x99ll also help you reach yours. Your success is our success as you grow stronger in your career. Join us and leave a legacy of your own, as a pioneer in both the company and the industry

Job Purpose The Principal Engineer is responsible for creating scalable and highly reliable software systems through the leverage of tools and automation. The principal engineer will focus on SRE related roles and in improving performance and operational efficiency of the Business Applications of the bank. Key Accountabilities
  • Carry out Capacity Management best practices for Business Applications in scope.
  • Monitor and Report on the coverage of Business Applications in scope.
  • Automate and identify scope for improvement of Reliability and Availability of applications in scope by leveraging the banks tools and knowledge of scripting.
  • Aim to implement Chaos Engineering practices to be better prepared at recovery of business-critical services and drive down the MTTD and MTTR. Demonstrate through monthly/quarterly reports.
  • Identify and implement CSI initiatives with a focus on reducing technical debt and improving reliability/scalability/availability.
  • Active participation in incident/problem management calls, BPM and RC

Qualifications

Academics:
  • Bachelor\xe2\x80\x99s degree or equivalent.
Job knowledge, skills & experience:
  • 10+ Years of demonstrable hands-on experience in improving the reliability of Critical Business Applications through SRE Best Practices.
  • Exceptional knowledge in systems monitoring, alerting and analytics (AppDynamics, Dynatrace, Splunk, etc.)
  • Experience in troubleshooting highly available, secure and reliable services with automatic failover using containers and container-orchestration tools like Kubernetes/OpenShift. While leveraging the monitoring solutions of the bank.
  • Extensive experience with Cloud Technologies Amazon Web Services and/or Azure.
  • Ability to define and report on the key KPIs to be tracked and improved using SRE best practices.
  • Experience in automating routine tasks \xe2\x80\x93 knowledge of Python, Bash, Ansible, Terraform
  • Experienced in working closely with Performance and Load test teams to define, track and analyse performance and availability targets for the Business Applications.
  • Ability to define comprehensive coverage requirements for monitored Business Applications and define the goals and outcomes to increase reliability and improve/maintain SLAs.
  • Demonstrates understanding of the Architecture of Business Applications with the ability to recommend improvements to improve reliability and uptime.
  • Experience using Chaos Engineering practices to build resiliency through the development lifecycle and Production.
  • Hands on knowledge of build automation and continuous integration/delivery ecosystem: Gitlab, Docker, Nexus, Selenium, Jenkins, Docker, Kubernetes.
  • Experience in working on a Linux based infrastructure
  • Critical thinker and problem-solving skills.
Must have knowledge
  • APM and log aggregation solution knowledge
  • Monitoring Tools Expertise minimum one or all tools like Splunk / ELK / AppDynamics / Dynatrace / NewRelica
  • Proficient in scripting - Python, Bash or Java
  • Experience working on Linux based infrastructure
  • ITIL Certified
Bonus knowledge
  • Experience in developing Continuous Integration/ Continuous Delivery pipelines (CI/ CD) \xe2\x80\x93 Gitlab /Azure Devops / Jenkins
  • Good hands-on knowledge of Configuration Management, Orchestration and Deployment tools like \xe2\x80\x93 Ansible, Terraform.
  • Cloud environment knowledge \xe2\x80\x93 Kubernetes, AWS EKS, Azure AKS
  • Working knowledge of various tools, open-source technologies, and cloud services
Behaviour Skills:
  • Independent, Self-Driven and able to bring ideas to the table
  • Ability to make decisions and drive changes.
  • Excellent Communication skills and able to communicate with senior stakeholders as well as with the technical teams.
  • Knowledgeable and a quick learner.
  • Fosters Innovation

Beware of fraud agents! do not pay money to get a job

MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD1508208
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Abu Dhabi, United Arab Emirates
  • Education
    Not mentioned