Description
Key Skills: Python, MLflow, Kubeflow, AWS SageMaker, Scikit-Learn, PyTorch, TensorFlow, SQL, Spark, PySpark
Good to Have Skills: Real-time inference for high-concurrency applications, in-game personalization, Data Privacy regulations (GDPR/CCPA), Reinforcement Learning, Recommendation Systems in gaming context, Airflow, Docker, Kubernetes, CI/CD principles.
Roles & Responsibilities:
- Lead the design and maintenance of end-to-end ML pipelines covering data ingestion, feature engineering, model training, and deployment.
- Architect and manage scalable model serving infrastructure using tools such as Databricks Model Serving, Seldon, or SageMaker.
- Implement and maintain a centralized feature store to ensure consistency and low-latency access between training and real-time inference.
- Develop and optimize Continuous Training pipelines that automate model retraining based on performance decay or new data availability.
- Implement specialized monitoring for ML assets, tracking model drift, feature skew, and prediction latency to ensure high-quality player experiences.
- Partner with Data Scientists to refactor experimental code into production-ready, modular, and testable components.
- Manage and optimize the infrastructure costs of GPU/CPU clusters, ensuring training jobs are balanced for performance and budget efficiency.
Experience Required: 6+ years of professional experience in MLOps, DevOps, or ML Engineering with a focus on productionizing ML at scale.
Education: Bachelor's or Master's degree in Computer Science, Engineering, or a related quantitative field