Build serving stores, sync pipelines, and API layers for pilot use cases.
Configure each pilot end-to-end: source table binding, key schema, sync schedule, and consumer integration.
Set up CI/CD pipelines with automated tests covering sync correctness, API contract validation, and latency benchmarks.
Operate to defined SLAs for latency, freshness, and availability.
Partner with domain teams (Consumer & Marketing, Commercial & Revenue, Growth, Pricing & Analytics) to onboard their use cases onto the patterns we build.
Contribute to reference implementations, blueprints, and documentation that future teams will reuse.
Required experience:
API development: production Python with FastAPI or comparable; versioned REST APIs, contracts, governance.
Batch and real-time data pipelines: Kafka or comparable streaming, plus CDC or incremental batch; built and operated end-to-end.
Caching and key-value serving: production Redis or Valkey; cache invalidation, TTL strategies, hot-path serving.
Vector databases and knowledge graphs: Pinecone, Weaviate, pgvector, Neo4j, or comparable; embeddings and retrieval patterns.
AI software engineering: hands-on building data infrastructure for AI and ML use cases (RAG, agent tooling, feature serving).
Azure Databricks, Delta Lake, Unity Catalog: hands-on production experience.
Delta Lake internals: transaction log, time travel, and Change Data Feed (CDF).
SQL and data modeling: comfortable with point-lookup vs analytical query patterns.
CI/CD: GitLab or GitHub Actions; automated tests for data pipelines.
Communication: works directly with senior architects, product managers, and domain stakeholders.
Nice to have:
Embedded analytical engines: DuckDB or comparable.
Microsoft Fabric / OneLake / Power BI Semantic Models: production experience.
SLAs and SLOs: defining and operating for data products or APIs.
MCP-style tooling: data access for AI agents.
Enterprise-scale data serving: prior work on serving infrastructure at large enterprise