Data Engineer

Noxtua

Seniority

Midweight

Model

Remote

Sector

AI-native

Salary

Undisclosed

Contract

Full-Time

As a Data Engineer, you will play a key role in the Data Expansion Squad, responsible for integrating and operationalizing legal data from multiple jurisdictions. You will transform heterogeneous source data into a unified, high-quality foundation that powers search, retrieval, and AI-supported workflows across products.
What you'll doDesign, build, and optimize end-to-end ETL pipelines for legal data from multiple jurisdictions, including cleaning, transformation, chunking, validation, embedding, and ingestion into vector databases
Work extensively with XML-based legal data feeds: parse, validate, normalize, and transform XML structures into scalable internal schemas and unified document formats
Develop and maintain data models and storage schemas that support continuously updated datasets while ensuring consistency, scalability, and accuracy across diverse datasets and large amounts of data
Coordinate data handover and integration from multiple internal and external data providers, including official sources, APIs, and web scraping pipelines, ensuring reliable and timely updates
Implement and continuously refine metadata enrichment strategies to maximize searchability, ranking quality, and relevance of legal information in vector databases
Build and maintain a high-performance search and retrieval infrastructure enabling agent-based systems to call search functions and retrieve the most relevant legal information efficiently
Own the data integration of one jurisdiction end-to-end
What you'll needAt least 2 years of professional experience in data engineering, and being involved in successfully deployed projects
Strong Python skills with experience in designing robust data pipelines
Experience in building and maintaining reliable ETL and RAG pipelines and a solid understanding of data modeling, quality, filtering, validation, and consistency
Familiarity with containerization (Docker), CI/CD pipelines, and version control (Git)
Strong grasp of data structures, algorithms, system design principles, and software engineering best practices
English proficiency at the C2 level
Nice to haveExpertise in working with graph databases and familiarity with developing and deploying NLP models
What they offer100% remote work possible (given a German residence), other countries upon request
Flexible working hours
26 days vacation + December 24th & 31st off, plus 1 additional day per year of employment (up to 30 days)
Laptop (Lenovo or Mac) and €1,000 net home office setup budget

APPLY →