Job Drop BerlinYOUR WAY INTO BERLIN TECH
NewsletterLinkedIn
AboutTermsImpressumPrivacy

Data Engineer

NNoxtua
Seniority
Midweight
Model
Remote
Sector
AI-native
Salary
Undisclosed
Contract
Full-Time

About the role

As a Data Engineer, you will play a key role in the Data Expansion Squad, responsible for integrating and operationalizing legal data from multiple jurisdictions. You will transform heterogeneous source data into a unified, high-quality foundation that powers search, retrieval, and AI-supported workflows across products.

What you'll do

  • Design, build, and optimize end-to-end ETL pipelines for legal data from multiple jurisdictions, including cleaning, transformation, chunking, validation, embedding, and ingestion into vector databases
  • Work extensively with XML-based legal data feeds: parse, validate, normalize, and transform XML structures into scalable internal schemas and unified document formats
  • Develop and maintain data models and storage schemas that support continuously updated datasets while ensuring consistency, scalability, and accuracy across diverse datasets and large amounts of data
  • Coordinate data handover and integration from multiple internal and external data providers, including official sources, APIs, and web scraping pipelines, ensuring reliable and timely updates
  • Implement and continuously refine metadata enrichment strategies to maximize searchability, ranking quality, and relevance of legal information in vector databases
  • Build and maintain a high-performance search and retrieval infrastructure enabling agent-based systems to call search functions and retrieve the most relevant legal information efficiently
  • Own the data integration of one jurisdiction end-to-end

What you'll need

  • At least 2 years of professional experience in data engineering, and being involved in successfully deployed projects
  • Strong Python skills with experience in designing robust data pipelines
  • Experience in building and maintaining reliable ETL and RAG pipelines and a solid understanding of data modeling, quality, filtering, validation, and consistency
  • Familiarity with containerization (Docker), CI/CD pipelines, and version control (Git)
  • Strong grasp of data structures, algorithms, system design principles, and software engineering best practices
  • English proficiency at the C2 level

Nice to have

  • Expertise in working with graph databases and familiarity with developing and deploying NLP models

What they offer

  • 100% remote work possible (given a German residence), other countries upon request
  • Flexible working hours
  • 26 days vacation + December 24th & 31st off, plus 1 additional day per year of employment (up to 30 days)
  • Laptop (Lenovo or Mac) and €1,000 net home office setup budget
APPLY →