Senior Data Engineer - Healthcare AI - UK/EU Remote
Vamstar
Title: Senior Data Engineer - Healthcare AI
Location: UK or Europe | Remote (±2–3 hrs GMT overlap mandatory)
Reports to: Head of Engineering
Existing Clients: Top 100 Lifesciences, MedTech and Pharma companies
Type: Full-time
Core responsibilities & objectives
- Design, build, and maintain batch/streaming data pipelines, ingestion, cleaning, normalisation, enrichment, deduplication.
- Build and own ML/LLM pipelines end-to-end: document parsing, chunking, embeddings generation, vector indexing, agentic tool calling, multi-step workflows, retries, fallbacks, and state handling.
- Write production-grade, well-tested Python that processes large volumes of data and documents reliably.
- Own pipeline health: if data is stale, broken, or wrong, it's on you.
- Work autonomously to project deadlines with minimal hand-holding.
Key qualifications & skills (non-negotiable)
- 7+ years in backend data-heavy development or data engineering.
- Highly proficient in Python
- Hands-on experience with large datasets and high-velocity data streams (Kafka, Flink, Spark).
- Strong with pipeline orchestration tools (Airflow, MLflow, or equivalent).
- Solid SQL skills (Postgres, BigQuery, or Snowflake) and NoSQL experience (DynamoDB, OpenSearch, Elastic).
- Real experience with LLM workflows: RAG architectures, embeddings/vector DBs, prompt engineering, function/tool calling, observability.
- Deep understanding of ETL/ELT patterns and data processing at scale.
Preferred background (strong signals)
- Experience with AWS data stack at scale.
- Exposure to healthcare, life sciences, or regulated industries.
- Built and shipped data, ML and LLM-powered pipelines in production.
- Has debugged a pipeline and knows why observability matters.
- Worked in a fast-moving startup where "that's not my job" doesn't exist.
What will get you rejected
- "I set up the pipeline, someone else monitors it" mindset.
- Tutorials and side projects but no production experience at scale.
- Can't explain trade-offs between streaming vs. batch, or why you chose one vector DB over another.
- Needs detailed specs before writing a line of code.
- No curiosity about healthcare or what the data actually means.
Interested? We're a distributed team solving hard problems that will reshape the healthcare industry for a generation. If you want ownership, not just tickets, we'd like to hear from you.