Description

You will build agentic AI solutions to mitigate risk and cost within large-scale production environments.

Responsibilities

  • Design and implement tool-calling agents that execute actions following MCP protocol, integrating retrieval and structured reasoning.
  • Productionize LLMs by building evaluation frameworks, retrieval pipelines, and self-correction loops for production operations.
  • Integrate agents with observability, incident management, and deployment systems for automated diagnostics and remediation.
  • Translate production pain points into agentic AI roadmaps by partnering with application teams and defining objective functions.
  • Instrument continuous evaluations and enforce guardrails, circuit breakers, and rollback strategies for safety and correctness.

Required Skills

  • 5+ years of software development experience in Python, C/C++, Go, or Java, with strong preference for large-scale Python applications.
  • 3+ years designing, architecting, and launching production ML systems, including model serving and evaluation.
  • Practical experience with LLMs: API integration, prompt engineering, and building agents using RAG and function calling.
  • Understanding of various LLM types, including commercial and open-source models (e.g., OpenAI, Gemini, Llama).
  • Solid grasp of applied statistics, core ML concepts, and data structures.
  • Experience with DynamoDB and Redshift.
  • Strong analytical problem-solving, ownership, and ability to communicate technical concepts clearly.
  • Familiarity with SageMaker.

Education

Any Graduate