Data Scientist

SumUp

Seniority

Midweight

Model

In-Office

Sector

Fintech

Salary

Undisclosed

Contract

Full-Time

About the roleAs a Data Scientist at SumUp, you will design, test, and implement both traditional machine learning and AI models, while also assessing the effectiveness of our AI products. You will have a direct impact on company performance by developing and scaling evaluation methods that measure service quality across agents and AI assistants, identify opportunities for improvement, and ensure we deliver the right support experience for each merchant segment.
What you'll doBuild and improve LLM-based evaluation models (LLM-as-a-Judge) to assess the quality, safety, and effectiveness of AI products (e.g., AI assistant outputs, agent copilots, support automation).
Help design and implement automated QA and evaluation pipelines that enable continuous monitoring and benchmarking of AI product performance at scale.
Contribute to end-to-end model development, from problem framing and data exploration to deployment and monitoring.
Own the forecasting models and work with WFM to improve the accuracy.
Develop and maintain data pipelines that combine product signals, customer support interactions, and feedback loops to support reliable evaluations and insights.
Support the creation of qualitative and quantitative observability frameworks, including rubrics, evaluation criteria, labeling strategies, and error categories.
Produce actionable insights from evaluation outputs to identify gaps and improvement opportunities, and communicate recommendations to product and operations stakeholders.
Build and iterate on ML models (AI or traditional) that improve support efficiency, such as prediction models for routing, deflection, resolution likelihood, or quality scoring.
What you'll need3+ years of experience applying data science to real-world products, ideally in AI systems, customer support, quality measurement, or evaluation frameworks.
Solid understanding of machine learning and statistics, including experimentation and metric design.
Experience working with LLMs or NLP systems, and interest in evaluation methods (e.g., rubric-based scoring, prompt-based judges, calibration approaches, or human-in-the-loop evaluation).
Strong ability to write clean, reliable code in Python (pandas, numpy, scikit-learn) and strong working knowledge of SQL.
Experience building or contributing to data workflows and pipelines, and familiarity with taking models into production environments.
Confidence communicating results and insights to both technical and non-technical stakeholders, with a strong focus on practical business impact.
What they offerDedicated annual L&D budget for conferences and/or further education.
Corporate pension scheme with up to 20% contribution matching.
30-day sabbatical benefit after 3 years of employment.
Employee referral rewards programme.

APPLY →