Staff Site Reliability Engineer

GetYourGuide

Seniority

Senior

Model

In-Office

Sector

Consumer

Salary

Undisclosed

Contract

Full-Time

About the roleYou will act as an "engineer for the engineers" — partnering with product teams to raise the bar on reliability, speed, and confidence in their systems. As a member of the Operational Excellence team, you will help GetYourGuide move toward a world of fewer interruptions and higher user trust — by preventing incidents before they happen and enabling teams to resolve them faster when they do.
What you'll doDrive down incident frequency, MTTD and MTTR
Lead post-incident reviews and translate learnings into systemic improvements
Build tooling and runbooks that enable teams to diagnose and resolve production issues faster
Advance our Datadog-based observability practice — metrics, logs, traces, dashboards, and alerting
Ensure teams have meaningful SLOs and actionable alerts — not alert fatigue
Improve change failure rate by helping teams invest in the right automated test coverage and pre-production validation
Design and maintain paved paths for development, observability, testing, and incident response so product teams can do the right things by default
Leverage AI tooling to accelerate incident response, improve developer workflows, and scale operational practices
What you'll needDeep understanding of observability tooling — we use Datadog (metrics, APM, logs, dashboards)
Proven experience reducing MTTD, MTTR, and change failure rate; DORA metrics are not just acronyms to you
Strong coding skills in Java; comfortable reading and contributing in Go across infrastructure contexts; enough frontend context to collaborate with React / Vue teams
Experience with Kubernetes, AWS, and service mesh technologies (Istio/Envoy)
Solid understanding of distributed systems, networking, and container technology
Hands-on experience with CI/CD, automated testing strategies, and build systems
Ability to influence engineers and teams without direct authority — you raise standards by coaching, not dictating
Excellent written and verbal communication skills in English
Nice to haveLed company-wide initiatives to measurably improve DORA metrics — specifically MTTD, MTTR and change failure rate
Identified systemic gaps in automated testing and driven improvements that led to meaningful reductions in change failure rate and production incidents
Embedded operational excellence practices into the culture of product engineering teams, not just platform teams
What they offerAnnual personal growth budget and mentorship programs
Work from anywhere in the world for 40 days per year
Flexible working arrangements
Monthly transportation and fitness budget
Health and wellness benefits
Language reimbursement program

APPLY →