Data Engineer
Noxtua
Seniority
Midweight
Model
Remote
Sector
Salary
Undisclosed
Contract
Full-Time
About the role
As a Data Engineer, you will play a key role in the Data Expansion Squad, responsible for integrating and operationalizing legal data from multiple jurisdictions. You will transform heterogeneous source data into a unified, high-quality foundation that powers search, retrieval, and AI-supported workflows across products.
What you'll do
- Design, build, and optimize end-to-end ETL pipelines for legal data from multiple jurisdictions, including cleaning, transformation, chunking, validation, embedding, and ingestion into vector databases
- Work extensively with XML-based legal data feeds: parse, validate, normalize, and transform XML structures into scalable internal schemas and unified document formats
- Develop and maintain data models and storage schemas that support continuously updated datasets while ensuring consistency, scalability, and accuracy across diverse datasets and large amounts of data
- Coordinate data handover and integration from multiple internal and external data providers, including official sources, APIs, and web scraping pipelines, ensuring reliable and timely updates
- Implement and continuously refine metadata enrichment strategies to maximize searchability, ranking quality, and relevance of legal information in vector databases
- Build and maintain a high-performance search and retrieval infrastructure enabling agent-based systems to call search functions and retrieve the most relevant legal information efficiently
- Own the data integration of one jurisdiction end-to-end
What you'll need
- At least 2 years of professional experience in data engineering, and being involved in successfully deployed projects
- Strong Python skills with experience in designing robust data pipelines
- Experience in building and maintaining reliable ETL and RAG pipelines and a solid understanding of data modeling, quality, filtering, validation, and consistency
- Familiarity with containerization (Docker), CI/CD pipelines, and version control (Git)
- Strong grasp of data structures, algorithms, system design principles, and software engineering best practices
- English proficiency at the C2 level
Nice to have
- Expertise in working with graph databases and familiarity with developing and deploying NLP models
What they offer
- 100% remote work possible (given a German residence), other countries upon request
- Flexible working hours
- 26 days vacation + December 24th & 31st off, plus 1 additional day per year of employment (up to 30 days)
- Laptop (Lenovo or Mac) and €1,000 net home office setup budget

