Description
You will bridge the gap between data science and production by operationalizing machine learning workflows.
Responsibilities
- Build and maintain CI/CD pipelines for ML model development, testing, and deployment.
- Develop reusable tools and frameworks for data processing, model training, validation, and monitoring.
- Collaborate with data scientists to ensure models are scalable, reliable, and reproducible.
- Manage and optimize compute infrastructure, including cloud and on-prem GPU/CPU clusters.
- Implement observability systems to track model performance, drift, and data integrity.
- Ensure governance through model versioning, reproducibility, and auditability.
Required Skills
- 3+ years of experience in ML Engineering, DevOps, or Infrastructure Engineering.
- Proficiency with AWS or Google Cloud Platform.
- Experience with orchestration tools like Kubernetes and Airflow.
- Hands-on use of MLOps frameworks such as MLflow, Kubeflow, or Metaflow.
- Strong Python coding skills.
- Experience with infrastructure-as-code tools including Terraform and Helm.
- Solid understanding of CI/CD practices and monitoring tools like Prometheus, Grafana, or Datadog.
Preferred Skills
- Experience deploying real-time inference services and batch prediction pipelines.
- Familiarity with model explainability, fairness, and responsible AI practices.
- Exposure to feature stores and experiment tracking platforms.