Design, build, and scale production-grade data pipelines specifically optimized for AI workflows, including data ingestion for LLM fine-tuning and RAG (Retrieval-Augmented Generation) patterns.
Seamlessly integrate AI APIs (OpenAI, Hugging Face, Anthropic) into existing data transformation and downstream usage layers for real-world projects.
Lead the cleaning, transformation, and feature engineering of large-scale datasets to ensure they are structured specifically for high-performance AI model training and inference.
Deploy and support AI solutions in production environments, managing the end-to-end lifecycle including monitoring, scaling, and resolving real data issues.
Develop and maintain dbt-based transformation layers and optimize Snowflake performance for both analytical and AI-driven workloads.
Implement robust API and event-driven ingestion pipelines with a focus on data quality, validation, and reliability across the entire AI data lifecycle.
What We’re Looking For
6–10+ years of hands-on data engineering experience with deep expertise in Python, dbt, and Snowflake.
Proven track record of integrating LLM/AI APIs into complex data pipelines for data ingestion, transformation, or downstream usage.
Demonstrated ability to deploy and support AI solutions in production, including active monitoring and scaling of models and data flows.
Advanced experience in preparing and structuring large-scale data (cleaning, transformation, and feature engineering) specifically for AI model usage.
Strong mastery of Airflow or similar orchestration tools to manage complex AI data workflows.
Solid understanding of data modeling and a proven track record of building production-grade pipelines that handle real-world data issues