5+ years in applied AI, data engineering, or ML engineering, with meaningful work on agentic systems, RAG, tool use, or enterprise-knowledge LLM applications.
Strong Python fluency and production experience with LLM orchestration frameworks (LangGraph, LlamaIndex, DSPy, or equivalents).
Experience designing evaluations for multi-step reasoning or agentic systems — rubric design, trajectory grading, measurement beyond single-turn accuracy.
Exposure to complex enterprise workflows (financial services, life sciences, legal, or similar) and the data and permission realities inside them.
A high written communication bar: you can produce a scoping document that a frontier lab research lead accepts without a rewrite.
Commercial instinct: you want to be in customer meetings, you can read a room, and you are willing to be measured on revenue