Job Drop BerlinYOUR WAY INTO BERLIN TECH
NewsletterLinkedIn
AboutTermsImpressumPrivacy

Data Scientist

FForto
Seniority
Midweight
Model
In-Office
Sector
B2B SaaS
Salary
Undisclosed
Contract
Full-Time

About the role

As a Data Scientist at Forto, you will take ownership of production ML systems that extract structured intelligence from unstructured logistics data. You will work closely with the Engineering Manager across three core workstreams — document data extraction (FlashDoc), vocabulary mapping, and classical ML.

What you'll do

  • Design, build, and maintain end-to-end ML pipelines for document extraction, classification, and data enrichment in production.
  • Build prompt evaluation frameworks and feedback-based optimization loops to systematically improve extraction accuracy.
  • Train custom in-house models using human-in-the-loop (HITL) data to move from assisted to fully automated extraction.
  • Build and maintain semantic similarity models for free-text to standardized TMS vocabulary across ports, terminals, container types, legal entities, and line items.
  • Improve pipeline reliability through redesign, testing, monitoring, and alerting for non-deterministic ML systems.
  • Evaluate and introduce disruptive approaches (new model architectures, fine-tuning strategies, novel evaluation methods) to achieve step-change accuracy improvements when incremental optimization plateaus.
  • Partner with Product Managers to identify where DS can solve real user pain points and shape product roadmaps with a data-informed perspective.
  • Collaborate closely with Engineering teams on integration, infrastructure, and API design to ensure DS outputs are consumed reliably by downstream systems.

What you'll need

  • 2+ years of professional experience in data science or machine learning engineering
  • Ability to design, deploy, and maintain ML systems in production including pipeline architecture, monitoring, reliability, and handling non-deterministic outputs at scale.
  • Strong proficiency in Python
  • Hands-on experience with LLMs (prompting, fine-tuning, evaluation) and understanding of their limitations in production environments.
  • Strong foundation in classical data science and statistics: regression, classification, time series analysis, data leakage, experimental design, and hypothesis testing.
  • Strong analytical and problem-solving skills with the ability to work independently on ambiguous, research-oriented problems.
  • Strong stakeholder management skills and ability to manage expectations on timelines and feasibility.

Nice to have

  • Strong use of agentic tools for coding
  • Ability to quickly get onboarded with new tools, technologies, and problem spaces
APPLY →