Description
You will manage the end-to-end machine learning lifecycle and deploy large-scale AI solutions.
Responsibilities
- Develop and optimize ML pipelines for scalable model development, training, and deployment.
- Automate model training, deployment, monitoring, and testing workflows using CI/CD.
- Deploy and fine-tune open-source LLM models including Llama, Mistral, and R1.
- Manage data pipelines to ensure efficient preparation of training data.
- Monitor LLM performance and evaluate metrics such as latency, accuracy, and cost-effectiveness.
Required Skills
- 5+ years of experience in machine learning operations.
- Deep expertise with Azure ML Studio and AWS SageMaker.
- Experience with MLflow, Databricks, and model versioning.
- Proficiency in containerization and orchestration using Docker and Kubernetes.
- Hands-on LLMOps experience with vLLM, LiteLLM, BentoML, Ollama, and Hugging Face.
- Practical knowledge of LangChain, Langfuse, and RAG implementations.
- Strong software engineering skills in Python and ML frameworks like PyTorch or TensorFlow.
- Experience building microservices and data pipelines.
- Competency with Gitlab or Azure DevOps for CI/CD and workflow automation.
Preferred Skills
- Experience developing AI solutions for product classification, data enrichment, and price optimization.