Description

You will architect and build a scalable machine learning platform for training, deployment, and lifecycle management of ML, LLM, and Generative AI models.

Responsibilities

  • Architect and build a scalable machine learning platform for training, deployment, and lifecycle management of ML, LLM, and Generative AI models.
  • Lead infrastructure development supporting production hosting of complex AI systems, including large-scale inference workloads.
  • Design developer-friendly abstractions and automation to simplify model development and deployment processes.
  • Implement and enhance MLOps capabilities like experiment tracking, model versioning, CI/CD for ML, monitoring, and reproducibility using Databricks and MLflow.
  • Serve as the technical leader for a team, guiding architecture decisions, design reviews, and engineering best practices.

Required Skills

  • 12+ years of experience in machine learning, NLP, and Generative AI systems.
  • Extensive experience in MLOps, including model lifecycle management and deployment.
  • Proficiency in Databricks and ML platform tools.
  • Experience deploying and managing large-scale AI/ML systems.
  • Knowledge of Agentic AI and modern AI frameworks.
  • Familiarity with LLMs, LangChain Development, and LLM Ops practices.
  • Strong understanding of cloud-based ML infrastructure and scalable architectures.
  • Experience building CI/CD pipelines for machine learning workflows.
  • Bachelor's degree in Computer Science, Engineering, Data Science, or a related field.

Education

Any Graduate