Job Drop BerlinYOUR WAY INTO BERLIN TECH
NewsletterLinkedIn
AboutTermsImpressumPrivacy

Site Reliability Engineer

HHelsing
Seniority
Midweight
Model
In-Office
Sector
Deeptech
Salary
Undisclosed
Contract
Full-Time

About the role

Much of our work takes place in high-security on-premise environments, and we are looking for a Site Reliability Engineer to support our high security environments. Your role will be to design, implement, and manage our on-premise Kubernetes infrastructure.

What you'll do

  • Design and build cloud-native infrastructure platforms on-premises, focusing on Kubernetes-based solutions that enable development teams to operate services at scale
  • Create robust observability frameworks using Grafana, Prometheus, and distributed tracing to ensure system reliability and performance
  • Architect and implement secure, multi-tenant Kubernetes clusters with strong access controls, policy-as-code governance, and zero-trust networking between red and black network domains
  • Develop operators and controllers to automate infrastructure provisioning and compliance
  • Build and maintain MLOps platforms enabling AI researchers to deploy, monitor, and scale machine learning models in production
  • Collaborate closely with Security teams to implement supply chain security, container scanning, and runtime protection across our cloud-native stack

What you'll need

  • Scripting experience in Python, Go, Rust, or Bash/Shell for automation and tooling
  • Experience with GitOps workflows and CI/CD automation
  • Deep experience operating production Kubernetes clusters, writing custom controllers/operators, and implementing service mesh architectures (Istio/Linkerd)
  • Hands-on experience with CNCF ecosystem including Helm, ArgoCD, Flux, and container runtime security tools like Falco
  • Expert-level knowledge of Grafana, Prometheus, Loki, Tempo, and OpenTelemetry; experience building custom dashboards, alerts, and SLI/SLO frameworks
  • Expert understanding of networking concepts, protocols and security
  • Proficiency with Terraform, Ansible, and Kubernetes manifest templating; experience with policy-as-code tools like OPA/Gatekeeper
  • Deep understanding of Linux/Unix system administration and highly available, distributed systems

Nice to have

  • Experience with Kubeflow, MLflow, or similar MLOps platforms
  • Experience running cloud-native workloads in on-premises or air-gapped environments

What they offer

  • Competitive compensation and stock options
  • Relocation support
  • Social and education allowances
  • Regular company events and all-hands across Europe
APPLY →