You will architect and build a scalable machine learning platform for training, deployment, and lifecycle management of ML, LLM, and Generative AI models.
Responsibilities
- Architect and build a scalable machine learning platform for training, deployment, and lifecycle management of ML, LLM, and Generative AI models.
- Lead infrastructure development supporting production hosting of complex AI systems, including large-scale inference workloads.
- Design developer-friendly abstractions and automation to simplify model development and deployment processes.
- Implement and enhance MLOps capabilities like experiment tracking, model versioning, CI/CD for ML, monitoring, and reproducibility using Databricks and MLflow.
- Serve as the technical leader for a team, guiding architecture decisions, design reviews, and engineering best practices.
Required Skills
- 12+ years of experience in machine learning, NLP, and Generative AI systems.
- Extensive experience in MLOps, including model lifecycle management and deployment.
- Proficiency in Databricks and ML platform tools.
- Experience deploying and managing large-scale AI/ML systems.
- Knowledge of Agentic AI and modern AI frameworks.
- Familiarity with LLMs, LangChain Development, and LLM Ops practices.
- Strong understanding of cloud-based ML infrastructure and scalable architectures.
- Experience building CI/CD pipelines for machine learning workflows.
- Bachelor's degree in Computer Science, Engineering, Data Science, or a related field.