to ensure the stability, performance, and reliability of our critical, customer-facing solutions hosted on AWS. This role is a blend of
deep, code-level analysis (L2)
, proactive automation, and technical mentorship.
You are not building new features, but you
will
be using your
strong coding skills (JavaScript/Node.js/Angular)
daily for advanced debugging, root cause analysis, and implementing fast fixes to production. If you are a Software Engineer who thrives on solving the most complex issues and writing code to ensure systems never fail, this is your next challenge.
Key Responsibilities
Code-Level Analysis & Fixing:
Provide
hands-on code-level debugging
in JavaScript, Node.js, and Angular to reproduce complex customer issues, identify root causes, and
implement minor, targeted fixes
(hotfixes) to production systems.
AWS & System Health:
Act as the primary technical expert for L2 escalations across core
AWS services
(Lambda, API Gateway, RDS, CloudWatch, etc.). Monitor, set up alerts, and analyze application and system logs (CloudWatch) to preemptively identify and resolve performance and reliability issues.
Database & Scripting:
Write complex SQL queries for relational databases (PostgreSQL, MySQL) for data investigation, troubleshooting, and reporting. Develop
automation scripts (Python/Bash)
to streamline operational runbooks and improve system efficiency.
Incident Management:
Own high-priority incidents, performing deep root cause analysis, coordinating efforts with Development and DevOps teams, and verifying successful deployment of fixes and resolutions.
Ticket Ownership:
Efficiently manage the escalated L2 ticket queue, ensuring strict adherence to SLAs and maintaining a high standard of customer satisfaction through detailed communication (calls, remote sessions, documentation)
Technical Mentorship:
Act as the
L2 subject matter expert and technical manager
for the L1 support engineers, managing shifts, coaching them on advanced troubleshooting techniques, code analysis basics, and effective resolution strategies.
Knowledge Base Development:
Develop and maintain high-quality documentation, troubleshooting guides, and best practices to reduce future escalations and strengthen the L1 knowledge base.
Work in
rotational shifts as part of a 24x7 support team
, with schedules planned in advance.
Based on your need to attract a
Software Engineer
profile who specializes in
AWS, deep code-level troubleshooting, automation (coding), and mentoring L1
, but without using the "Support" or "Lead" title, the best-fit job title is
Senior Software Operations Engineer
or
Site Reliability Engineer (SRE)
.
I will use
Senior Software Operations Engineer
as it most accurately captures the software focus, operational duties, and seniority required for L2 escalation and mentoring. I have also adjusted the conflicting "Up to 4 years" and "Team Lead" requirements to be more realistic for this seniority level.
to ensure the stability, performance, and reliability of our critical, customer-facing applications hosted on AWS. This role is a blend of
deep, code-level analysis (L2)
, proactive automation, and technical mentorship.
You are not building new features, but you
will
be using your
strong coding skills (JavaScript/Node.js/Angular)
daily for advanced debugging, root cause analysis, and implementing fast, surgical hotfixes to production. If you are a Software Engineer who thrives on solving the most complex operational issues and writing code to ensure systems never fail, this is your next challenge.
in JavaScript, Node.js, and Angular to reproduce complex customer issues, identify root causes, and
implement minor, targeted fixes
(hotfixes) to production systems.
AWS & System Health:
Act as the primary technical expert for L2 escalations across core
AWS services
(Lambda, API Gateway, RDS, CloudWatch, etc.). Monitor, set up alerts, and analyze application and system logs (CloudWatch) to preemptively identify and resolve performance and reliability issues.
Database & Scripting:
Write complex SQL queries for relational databases (PostgreSQL, MySQL) for data investigation, troubleshooting, and reporting. Develop
automation scripts (Python/Bash)
to streamline operational runbooks and improve system efficiency.
Incident Management:
Own high-priority incidents, performing deep root cause analysis, coordinating efforts with Development and DevOps teams, and verifying successful deployment of fixes and resolutions.
Ticket Ownership:
Efficiently manage the escalated L2 ticket queue, ensuring strict adherence to SLAs and maintaining a high standard of customer satisfaction through detailed communication (calls, remote sessions, documentation).
Mentorship & Process Improvement
Technical Mentorship:
Act as the
L2 subject matter expert and technical mentor
for the junior operations/support engineers, coaching them on advanced troubleshooting techniques, code analysis basics, and effective resolution strategies.
Knowledge Base Development:
Develop and maintain high-quality documentation, troubleshooting guides, and best practices to reduce future escalations and strengthen the L1 knowledge base.
DevOps Contribution:
Actively contribute to process and tool improvements related to continuous integration/continuous deployment (CI/CD) and monitoring to continuously improve system reliability.
Required Qualifications
Experience:
5+ years
of progressive experience in a technical role such as Software Engineer
troubleshooting and automation.
Coding Proficiency (Mandatory):
Strong ability to
read, debug, and implement minor bug fixes and hotfixes
in existing applications using
JavaScript, Node.js, and Angular.
(This is a core requirement, used for fixing/automation).
AWS Expertise:
Deep, hands-on experience with core AWS services relevant to modern software architecture (e.g., Lambda, API Gateway, RDS, S3, CloudWatch).
Database Skills:
Proficiency in querying and managing relational databases (PostgreSQL, MySQL).
Troubleshooting Mastery:
Advanced skills in log analysis (AWS CloudWatch), system diagnostics, and performing efficient Root Cause Analysis (RCA) for complex, distributed systems.
Technical Mentorship:
Demonstrated ability to
manage
,
coach, train, and technically guide
L1 support engineers and communicate complex technical concepts effectively.
Collaboration:
Proven ability to work seamlessly with cross-functional Development and DevOps teams to resolve defects and coordinate production deployments.
Customer Focus:
Exceptional customer service and communication skills to effectively guide customers/stakeholders to identify and solve issues.
Availability:
Must be able to work in
rotational shifts
as part of a 24x7 operations team.
Immediate Availability
is preferred.
Job Type: Full-time
Beware of fraud agents! do not pay money to get a job
MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.