Senior Service Reliability Engineer Emirates Nbd

Dubai, United Arab Emirates

Job Description

Core Operational objectives include:

Project Engagement

Work with technical SME\xe2\x80\x99s to analyze business needs, improve supportability, scalability and recovery for the engineered solution. Partner with Project and Technical teams to inject operational non-negotiables centered on availability, reliability into the design and built. Help drive the success of the SRE capability through identification, design, and implementation of technical uplift, delivery and product performance.

Conduct gap analysis, technical reviews to report on technical debts, design deviations.

Partner with technical teams to improve services through rigorous testing including chaos testing centered on resilience, recoverability. Recommend and drive technical enhancements to improve overall operational resiliency.

Support implementing observability for new/existing systems to enhance fault detection; identify the correct routing for each alert and establish thresholds for immediate notification. Work with Monitoring Enablement team to instrument the right monitoring coverage on service/system performance data with the intent to provide insights into root cause of application bottlenecks and enable real-time telemetry to reduce availability risk exposure.

Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve uptime. Contribute to capacity planning, software performance analysis, and systems tuning.

Encourage technical teams to document application run books, failure points to enable quicker service restoration in production.

Business As Usual (BAU)

Influence and drive policies, operational processes, standards/guidelines, and solutions that proactively address issues before they impact system functionality or performance.

Own and drive improvement on Mean-Time-To-Repair (MTTR), Mean-Time-Between-Incidents (MTBI) and uptime.

Develop enhancements to improve service levels by leveraging key performance indicators consisting of monitoring, non-functional testing and availability reports. Provide a service-focused approach leveraging continuous process improvement.

Drive the improvement of service availability to reduce the mean time to recovery using automation. Ensure methods for autonomous recovery and self-repairing systems are implemented and the solution is consistent with design standards

Participate in system design consulting and capacity planning.

Provide extended support for major incident resolution for technical problems driven by Operational expert team. Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding

Contribute to capacity planning, demand forecasting and employ analytics and report on trends.

Design and develop tools that will aid in improving the reliability of our technology and services.

Document repeatable actions for automation

Continually improve the monitoring, alerting, and automation to improve service reliability, availability,

performance, and overall system health. Work with technical teams to develop and improve telemetry for products and infrastructure. Champion the best monitoring setup/Control reporting,

Provide feedback loop to Create/Update Procedures/design patterns for Enterprise wide adoption by Architecture and development Chapters.

Review system performance for trends, correlations, and any information that could result in a better, more efficient operation and service reliability. Develop and implement continuous improvement initiatives based on data analysis

Support platform/engineering team to enable high-availability, fault-tolerant infrastructure and systems to support critical products and processes

Support platform/engineering team to fulfill application logging requirements to enable more effective troubleshooting for issue resolution and identification of root causes

Perform strategic IT infrastructure planning that guarantees capacity to be available when needed while providing acceptable service/application performance levels.

Skills and Experiences:-
  • 6 years experience as lead technical role in large enterprise wide projects
  • 4+ years in operations or software engineering or Reliability Engineering role.
  • Bachelor\'s degree in Computer Engineering, Computer Science, Information Systems or other related field is highly preferred; however, equivalent work experience will not be overlooked.
  • Java Spring Boot Experience
  • DevOps Experience/ Tools which helps to be a DevOps Engineer
  • AWS & AZURE
  • Cloud Transition Model \xe2\x80\x93 Waterfall/Agile - CI / CD DevOps /Dev Sec Ops
  • Chaos Testing Automation on the MicroServices
  • OPENSHIFT (PaaS Platform)
  • RHEL ,CENTOS & UBUNTU (OS)
  • VIRTUALBOX & VAGRANT (Virtualization)
  • DOCKER (Container RUNTIME Engine).
  • NGINX (Performing webserver for Containers)
  • Knowledge on ANSIBLE AUTOMATION
  • KUBERNETES (Container Orchestration), HELM (Kubernetes Package Management)
  • ENVOY & ISTIO (Service Mesh Data and Control Planes)
  • HARSHICORP (Securing Credentials)
  • Knowledge on MicroServices Fundamentals & Patterns, Monitoring the MicroServices , Custom Alerting
  • Understanding of monitoring/telemetry solutions (Icinga, ELK, AppDynamics) for data ingestion and analysis
  • PROMETHEUS (Container Infrastructure Monitoring), ELK (Log Monitoring), RUM (Real User Monitoring), GRAFANA Monitoring Dashboard Tool
  • Mongo DB, Postgres, Oracle
  • Experience with Atlassian suite of products

Talent Pal

Beware of fraud agents! do not pay money to get a job

MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD1572834
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Dubai, United Arab Emirates
  • Education
    Not mentioned