Description
You will manage the deployment, monitoring, and maintenance of machine learning models in production environments.
Responsibilities
- Operationalize ML models by building CI/CD pipelines for ML workflows in collaboration with data scientists.
- Design and maintain scalable deployment architectures using Docker, Kubernetes, and infrastructure-as-code.
- Monitor model performance and data drift using Prometheus, Grafana, or custom solutions.
- Automate data pipelines and retraining workflows using orchestration tools like Airflow or Kubeflow.
- Provision and manage infrastructure using Terraform and AWS CloudFormation.
- Lead technical workshops, build mockups, and coordinate directly with clients during PST business hours.
Required Skills
- 4–5 years of hands-on experience in MLOps or production ML environments.
- Mandatory proficiency in Python and Bash/Shell scripting.
- Experience with cloud platforms including AWS, Azure, or GCP.
- Hands-on expertise with Terraform and CloudFormation.
- Proficiency in Docker and Kubernetes.
- Experience with CI/CD tools such as Jenkins, GitHub Actions, or GitLab CI/CD.
- Knowledge of ML lifecycle tools including MLflow, DVC, Kubeflow, or SageMaker.
- Experience with workflow orchestration via Apache Airflow or Prefect.
- Competency in monitoring and logging with Prometheus, Grafana, or ELK stack.
- Proven ability to communicate technical content to clients and cross-functional teams.
Preferred Skills
- Bachelor’s or Master’s degree in Computer Science, Data Science, or Engineering.
- Experience with large-scale data systems, distributed computing, and model governance.