Description
You will design and develop data pipelines for agentic systems, managing complex interactions between AI agents and data sources.
Responsibilities
- Design and build data architecture, including databases and data lakes, to support data engineering tasks.
- Develop and manage ELT processes to move data from source systems to analytical platforms.
- Implement data pipelines that facilitate feedback loops for human-in-the-loop systems.
- Work with vector databases to store and retrieve embeddings efficiently.
- Collaborate with data scientists to preprocess data, train models, and integrate AI into applications.
- Optimize data storage and retrieval, including determining effective partitioning criteria using Spark.
Required Skills
- 10+ years of experience in data engineering.
- Strong programming skills in Python and experience with AI/ML frameworks.
- Proficiency with Spark and Databricks.
- Experience training and fine-tuning LLMs with structured and unstructured datasets.
- Experience with Azure services: Blob Storage, Data Lakes, Databricks, Machine Learning, Computer Vision, Video Indexer, OpenAI models, Media Services, and AI Search.
- Hands-on experience with vector databases and embedding models for retrieval tasks.
- Knowledge of Graph DB and core machine learning concepts/algorithms.
- Experience with GIS spatial data, including lat/long, road topology, and geolocation.
- Experience with Department of Transportation Data Domains and developing AI Composite Agentic Solutions.
- Expertise in integrating with AI agent frameworks.
Preferred Skills
- Bachelor's or Master's degree in Computer Science, AI, Data Science, or a related field.