Now it\xe2\x80\x99s your time to join the #1 bank in the Middle East and one of the most prestigious financial companies in the region. Shaking up the world of banking requires a lot of smarts and skill. We\xe2\x80\x99re looking for the brightest and best to help us reach our goals and we\xe2\x80\x99ll also help you reach yours. Your success is our success as you grow stronger in your career. Join us and leave a legacy of your own, as a pioneer in both the company and the industry
Job Purpose The Principal Engineer is responsible for creating scalable and highly reliable software systems through the leverage of tools and automation. The principal engineer will focus on SRE related roles and in improving performance and operational efficiency of the Business Applications of the bank. Key Accountabilities
Carry out Capacity Management best practices for Business Applications in scope.
Monitor and Report on the coverage of Business Applications in scope.
Automate and identify scope for improvement of Reliability and Availability of applications in scope by leveraging the banks tools and knowledge of scripting.
Aim to implement Chaos Engineering practices to be better prepared at recovery of business-critical services and drive down the MTTD and MTTR. Demonstrate through monthly/quarterly reports.
Identify and implement CSI initiatives with a focus on reducing technical debt and improving reliability/scalability/availability.
Active participation in incident/problem management calls, BPM and RC
Qualifications
Academics:
Bachelor\xe2\x80\x99s degree or equivalent.
Job knowledge, skills & experience:
10+ Years of demonstrable hands-on experience in improving the reliability of Critical Business Applications through SRE Best Practices.
Exceptional knowledge in systems monitoring, alerting and analytics (AppDynamics, Dynatrace, Splunk, etc.)
Experience in troubleshooting highly available, secure and reliable services with automatic failover using containers and container-orchestration tools like Kubernetes/OpenShift. While leveraging the monitoring solutions of the bank.
Extensive experience with Cloud Technologies Amazon Web Services and/or Azure.
Ability to define and report on the key KPIs to be tracked and improved using SRE best practices.
Experience in automating routine tasks \xe2\x80\x93 knowledge of Python, Bash, Ansible, Terraform
Experienced in working closely with Performance and Load test teams to define, track and analyse performance and availability targets for the Business Applications.
Ability to define comprehensive coverage requirements for monitored Business Applications and define the goals and outcomes to increase reliability and improve/maintain SLAs.
Demonstrates understanding of the Architecture of Business Applications with the ability to recommend improvements to improve reliability and uptime.
Experience using Chaos Engineering practices to build resiliency through the development lifecycle and Production.
Hands on knowledge of build automation and continuous integration/delivery ecosystem: Gitlab, Docker, Nexus, Selenium, Jenkins, Docker, Kubernetes.
Experience in working on a Linux based infrastructure
Critical thinker and problem-solving skills.
Must have knowledge
APM and log aggregation solution knowledge
Monitoring Tools Expertise minimum one or all tools like Splunk / ELK / AppDynamics / Dynatrace / NewRelica