Job Drop BerlinYOUR WAY INTO BERLIN TECH
NewsletterLinkedIn
AboutTermsImpressumPrivacy

Senior Site Reliability Engineer

FForto
Seniority
Senior
Model
In-Office
Sector
B2B SaaS
Salary
Undisclosed
Contract
Full-Time

About the role

The Site Reliability Engineering team at Forto is responsible for reliability and developer experience. We enable our development teams to write complex business logic by providing best-in-class tooling and infrastructure. This is a high-ownership role on a lean team that directly shapes how 70+ engineers build and ship software.

What you'll do

  • Build out our runtime platform as a self-service product that enables our engineering teams to write code, run workloads, and drive engineering culture forward.
  • Bring software development skills and practices into platform engineering, such as code quality, domain-driven design, and test-driven development.
  • Own the developer portal and internal platform roadmap, including leading this year's overhaul of our CI/CD pipelines in collaboration with all product teams.
  • Ensure site reliability by building observability solutions, deployment, and disaster recovery capabilities.
  • Own reliability standards end-to-end through SLOs and error budgets — shaping how teams balance velocity and risk.
  • Drive infrastructure cost optimisation across Kubernetes, MongoDB, and Datadog at scale.
  • Improve our security posture through tooling, compliance work, and partnership with security stakeholders.
  • Work closely with the entire Engineering function as a steward of platform architecture — embracing new technologies and cleaning up old ones.

What you'll need

  • 5+ years in backend or infrastructure engineering, with at least 2 years in an SRE or platform engineering role.
  • Hands-on experience with GCP/AWS, Kubernetes, Terraform, and Helm in a production environment.
  • Strong software development background — building frameworks, internal tooling, and infrastructure.
  • Experience with an observability platform (Datadog or equivalent) at scale — not just dashboards, but alerting strategy, cost management, and SLO instrumentation.
  • Experience defining and operating SLOs and error budgets as a reliability mechanism, not just as metrics.
  • Solid understanding of Infrastructure as Code (IaC) and a GitOps-first mindset.
  • Proven track record designing, developing, and troubleshooting complex distributed systems.
APPLY →