Job Drop BerlinYOUR WAY INTO BERLIN TECH
NewsletterLinkedIn
AboutTermsImpressumPrivacy

Staff Site Reliability Engineer

GGetYourGuide
Seniority
Senior
Model
In-Office
Sector
Consumer
Salary
Undisclosed
Contract
Full-Time

About the role

You will act as an "engineer for the engineers" — partnering with product teams to raise the bar on reliability, speed, and confidence in their systems. As a member of the Operational Excellence team, you will help GetYourGuide move toward a world of fewer interruptions and higher user trust — by preventing incidents before they happen and enabling teams to resolve them faster when they do.

What you'll do

  • Drive down incident frequency, MTTD and MTTR
  • Lead post-incident reviews and translate learnings into systemic improvements
  • Build tooling and runbooks that enable teams to diagnose and resolve production issues faster
  • Advance our Datadog-based observability practice — metrics, logs, traces, dashboards, and alerting
  • Ensure teams have meaningful SLOs and actionable alerts — not alert fatigue
  • Improve change failure rate by helping teams invest in the right automated test coverage and pre-production validation
  • Design and maintain paved paths for development, observability, testing, and incident response so product teams can do the right things by default
  • Leverage AI tooling to accelerate incident response, improve developer workflows, and scale operational practices

What you'll need

  • Deep understanding of observability tooling — we use Datadog (metrics, APM, logs, dashboards)
  • Proven experience reducing MTTD, MTTR, and change failure rate; DORA metrics are not just acronyms to you
  • Strong coding skills in Java; comfortable reading and contributing in Go across infrastructure contexts; enough frontend context to collaborate with React / Vue teams
  • Experience with Kubernetes, AWS, and service mesh technologies (Istio/Envoy)
  • Solid understanding of distributed systems, networking, and container technology
  • Hands-on experience with CI/CD, automated testing strategies, and build systems
  • Ability to influence engineers and teams without direct authority — you raise standards by coaching, not dictating
  • Excellent written and verbal communication skills in English

Nice to have

  • Led company-wide initiatives to measurably improve DORA metrics — specifically MTTD, MTTR and change failure rate
  • Identified systemic gaps in automated testing and driven improvements that led to meaningful reductions in change failure rate and production incidents
  • Embedded operational excellence practices into the culture of product engineering teams, not just platform teams

What they offer

  • Annual personal growth budget and mentorship programs
  • Work from anywhere in the world for 40 days per year
  • Flexible working arrangements
  • Monthly transportation and fitness budget
  • Health and wellness benefits
  • Language reimbursement program
APPLY →