Staff Site Reliability Engineer
GetYourGuide
Seniority
Senior
Model
In-Office
Sector
Salary
Undisclosed
Contract
Full-Time
About the role
You will act as an "engineer for the engineers" — partnering with product teams to raise the bar on reliability, speed, and confidence in their systems. As a member of the Operational Excellence team, you will help GetYourGuide move toward a world of fewer interruptions and higher user trust — by preventing incidents before they happen and enabling teams to resolve them faster when they do.
What you'll do
- Drive down incident frequency, MTTD and MTTR
- Lead post-incident reviews and translate learnings into systemic improvements
- Build tooling and runbooks that enable teams to diagnose and resolve production issues faster
- Advance our Datadog-based observability practice — metrics, logs, traces, dashboards, and alerting
- Ensure teams have meaningful SLOs and actionable alerts — not alert fatigue
- Improve change failure rate by helping teams invest in the right automated test coverage and pre-production validation
- Design and maintain paved paths for development, observability, testing, and incident response so product teams can do the right things by default
- Leverage AI tooling to accelerate incident response, improve developer workflows, and scale operational practices
What you'll need
- Deep understanding of observability tooling — we use Datadog (metrics, APM, logs, dashboards)
- Proven experience reducing MTTD, MTTR, and change failure rate; DORA metrics are not just acronyms to you
- Strong coding skills in Java; comfortable reading and contributing in Go across infrastructure contexts; enough frontend context to collaborate with React / Vue teams
- Experience with Kubernetes, AWS, and service mesh technologies (Istio/Envoy)
- Solid understanding of distributed systems, networking, and container technology
- Hands-on experience with CI/CD, automated testing strategies, and build systems
- Ability to influence engineers and teams without direct authority — you raise standards by coaching, not dictating
- Excellent written and verbal communication skills in English
Nice to have
- Led company-wide initiatives to measurably improve DORA metrics — specifically MTTD, MTTR and change failure rate
- Identified systemic gaps in automated testing and driven improvements that led to meaningful reductions in change failure rate and production incidents
- Embedded operational excellence practices into the culture of product engineering teams, not just platform teams
What they offer
- Annual personal growth budget and mentorship programs
- Work from anywhere in the world for 40 days per year
- Flexible working arrangements
- Monthly transportation and fitness budget
- Health and wellness benefits
- Language reimbursement program

