← Back to jobs
Bangalore, Karnataka, India
No related jobs found
Candidate Skill: MLOps, AgentOps, Python, RAG, LLM, MLflow, LangChain, Vector DB, Docker, Cloud, CI/CD
Experience: 5–7 Years
Job Description: We are looking for AI/ML Ops Engineers (AgentOps/MLOps) to support and scale the SAKS agentic AI platform. The role focuses on evaluation frameworks, RAG optimization, experiment tracking, and ensuring operational excellence across cost, latency, reliability, and guardrails. Key Responsibilities Evaluation & Quality Build evaluation suites for agents (test datasets, scoring, regression tests, guardrails) Track RAG performance metrics and expose KPI dashboards AgentOps & MLOps Configure and optimize LLMs (Bedrock, Claude, OpenAI, Azure OpenAI) Operate and improve RAG pipelines (embeddings, retrieval, chunking, prompts) Implement experiment tracking and CI pipelines for evaluations Deployment & Reliability Automate deployment, rollback, and configuration management Monitor and optimize token usage, latency, throughput, and cost Monitoring & Incident Response Set up observability (logs, metrics, traces) Define alerts and handle incident response, RCA, and improvements Required Skills Strong Python with hands-on experience in LLM/Agent frameworks (LangChain/LangGraph or equivalent) Experience with Bedrock / OpenAI / Azure OpenAI Expertise in RAG pipelines (embeddings, vector search, chunking, prompt tuning) Hands-on with vector databases (pgvector, Pinecone, Weaviate) Strong experience in evaluation frameworks and experiment tracking (MLflow or equivalent) Knowledge of CI/CD, Git, and deployment practices Familiarity with monitoring, logging, and cost optimization Experience with FastAPI or similar frameworks Hands-on with Docker and cloud platforms (AWS / Azure / GCP) Understanding of structured outputs, schema validation (JSON/Pydantic) Good to Have Experience with RAG evaluation frameworks (RAGAS, OpenAI Evals) Knowledge of multimodal pipelines (document/image + text) Exposure to Hugging Face, CrewAI, AutoGen, LlamaIndex Experience with Airflow/Dagster, PySpark, Snowflake Familiarity with MLOps tools, model registry, Kubernetes, Terraform Understanding of security, PII handling, and content safety Experience with Azure AI Foundry / Azure AI Studio Soft Skills Strong problem-solving and analytical skills Good communication and documentation abilities Ability to handle on-call, incidents, and collaboration with SMEs
Any Graduate
No related jobs found
← Back to jobs