Cloud Site Reliability Engineer (Scalability)
Scalable Capital
Seniority
Senior
Model
In-Office
Sector
Salary
Undisclosed
Contract
Full-Time
About the role
Our team's mission is to ensure that teams can scale up their services while being performant, reliable and cost optimised through self-service tooling and transferring best-practices for observability, load-testing, services discoverability, chaos engineering and FinOps.
What you'll do
- Shape the way how Scalable runs microservices in the most performant, secure and cost efficient way
- Collaborate with cross-functional teams to identify and understand scalability requirements for our platform, both in terms of user growth and increasing data volume
- Design and rollout Monitoring best practices in Datadog including SLI, SLO and SLAs
- Research and develop service and storage improvements by using serverless technologies and optimise our services and CICD to optimise scalability, cost and performance
- Develop and maintain internal tooling around Monitoring, Developer Portal and Load Testing
- Mentor and enable our software development teams to further foster our DevOps culture by educating them and providing reusable and unified building blocks
- Stay up-to-date with the latest industry trends, tools, and techniques related to scalability and performance engineering
- Design and implement best practices around auto scaling of our infrastructure
- Run chaos engineering experiments to improve resilience of our services

