Description
You will design, build, and deploy agentic AI systems and LLM-based workflows in production. You own the end-to-end lifecycle from model training to inference, monitoring, and scaling on AWS.
Responsibilities
- Design and implement multi-agent architectures using LangGraph and LangChain.
- Deploy and scale AI/ML solutions on AWS, leveraging EC2, S3, Lambda, EKS, and SageMaker.
- Build and maintain end-to-end ML pipelines, including training, inference, and continuous monitoring.
- Implement Retrieval-Augmented Generation (RAG) pipelines using vector databases.
- Integrate LLMs (OpenAI, Anthropic, Bedrock) into production systems while ensuring reliability and security.
Required Skills
- 5+ years of experience in AI/ML engineering.
- Strong hands-on expertise with LangGraph and LangChain.
- Proven experience deploying agentic AI systems and LLM workflows into production.
- Advanced Python programming skills with a solid understanding of software engineering best practices.
- Deep knowledge of AWS services: EC2, S3, Lambda, EKS, SageMaker, and Bedrock.
- Experience with model deployment, monitoring, and scaling in cloud-native architectures.
- Strong understanding of LLMs, prompt engineering, and orchestration patterns.
Preferred Skills
- Experience collaborating with product and engineering teams to deliver AI features.