Job Drop BerlinYOUR WAY INTO BERLIN TECH
NewsletterLinkedIn
AboutTermsImpressumPrivacy

Geospatial Data Engineer - Remote Sensing & AI Pipelines

LLiveEO
Seniority
Senior
Model
Remote
Sector
Space
Salary
Undisclosed
Contract
Full-Time

About the role

We are looking for a Senior Geospatial Data Engineer to build the high-performance data backbone for our multitemporal, multimodal Earth observation models. You will own the full geospatial data lifecycle: discovery, ingestion, standardisation, quality assurance, and delivery of production-ready datasets that combine very high-resolution optical and SAR imagery. You will also play an active role in ML lifecycle management, from dataset versioning and experiment tracking through to model deployment and monitoring.

What you'll do

  • Design and operate scalable EO data discovery workflows integrating STAC-compliant catalogues to ingest high-value optical and SAR datasets.
  • Maintain structured metadata stores in PostgreSQL/PostGIS, tracking provenance, sensor parameters, and coverage gaps across all data assets.
  • Build and maintain ETL/ELT workflows using Prefect and Ray to process satellite imagery at scale, including radiometric calibration, orthorectification, co-registration, and SAR pre-processing.
  • Own tiling and patch-generation strategies, and develop deterministic pipeline execution frameworks that behave consistently across geographies and acquisition conditions.
  • Design and automate QA checks across the full pipeline — band integrity, co-registration accuracy, label alignment, and class distribution — with monitoring to detect data drift before it reaches model training.
  • Own dataset versioning and lineage tracking in MLflow and Databricks, and partner with ML Engineers to deliver production-ready data loaders and inference interfaces for PyTorch Lightning workflows.
  • Maintain our AWS-based cloud stack and Databricks environments, and drive adoption of cloud-native geospatial standards (COG, Zarr, STAC).

What you'll need

  • Deep hands-on experience with geospatial data formats, projections, and operations; proficiency with GDAL, Rasterio, GeoPandas, and STAC for large-scale EO data handling.
  • Solid understanding of satellite-based remote sensing principles for both optical and SAR sensors, including SAR pre-processing concepts and common data formats (GeoTIFF, SAFE, HDF5).
  • Demonstrated ability to design and automate data quality frameworks, including validation logic, anomaly detection, and monitoring for large-scale geospatial pipelines.
  • Mastery of Python with a focus on clean, maintainable, and testable code in a production environment.
  • Proficiency in Prefect (or Airflow) and distributed computing frameworks like Ray for large-scale processing.
  • Strong knowledge of PostgreSQL/PostGIS for managing complex geospatial metadata at scale.

Nice to have

  • Hands-on experience with MLflow, Databricks, and dataset versioning best practices; familiarity with PyTorch Lightning.
  • Experience with AWS infrastructure and Databricks for large-scale data processing.
  • Experience with dedicated SAR pre-processing libraries (e.g., SNAP, PyroSAR, s1tbx) and multi-temporal coherence analysis.
  • Practical experience with Cloud-Optimised GeoTIFF, Zarr, or STAC API implementations.

What they offer

  • Flexible working hours and hybrid work model.
  • Not an overtime culture — overtime offset with time off and rest.
  • Frequent internal workshops, knowledge sharing sessions, journal clubs and hackathons.
  • Potential to participate in the employee stock option program.
  • Urban Sports membership, BVG subsidy, and corporate pension program.
  • Office in central Berlin Kreuzberg with free fruit, nuts and drinks.
APPLY →