Site Reliability Engineer
Helsing
Seniority
Midweight
Model
In-Office
Sector
Salary
Undisclosed
Contract
Full-Time
About the role
Much of our work takes place in high-security on-premise environments, and we are looking for a Site Reliability Engineer to support our high security environments. Your role will be to design, implement, and manage our on-premise Kubernetes infrastructure.
What you'll do
- Design and build cloud-native infrastructure platforms on-premises, focusing on Kubernetes-based solutions that enable development teams to operate services at scale
- Create robust observability frameworks using Grafana, Prometheus, and distributed tracing to ensure system reliability and performance
- Architect and implement secure, multi-tenant Kubernetes clusters with strong access controls, policy-as-code governance, and zero-trust networking between red and black network domains
- Develop operators and controllers to automate infrastructure provisioning and compliance
- Build and maintain MLOps platforms enabling AI researchers to deploy, monitor, and scale machine learning models in production
- Collaborate closely with Security teams to implement supply chain security, container scanning, and runtime protection across our cloud-native stack
What you'll need
- Scripting experience in Python, Go, Rust, or Bash/Shell for automation and tooling
- Experience with GitOps workflows and CI/CD automation
- Deep experience operating production Kubernetes clusters, writing custom controllers/operators, and implementing service mesh architectures (Istio/Linkerd)
- Hands-on experience with CNCF ecosystem including Helm, ArgoCD, Flux, and container runtime security tools like Falco
- Expert-level knowledge of Grafana, Prometheus, Loki, Tempo, and OpenTelemetry; experience building custom dashboards, alerts, and SLI/SLO frameworks
- Expert understanding of networking concepts, protocols and security
- Proficiency with Terraform, Ansible, and Kubernetes manifest templating; experience with policy-as-code tools like OPA/Gatekeeper
- Deep understanding of Linux/Unix system administration and highly available, distributed systems
Nice to have
- Experience with Kubeflow, MLflow, or similar MLOps platforms
- Experience running cloud-native workloads in on-premises or air-gapped environments
What they offer
- Competitive compensation and stock options
- Relocation support
- Social and education allowances
- Regular company events and all-hands across Europe

