← Back to jobs
Tampa, FL, USA
No related jobs found
>> AI Reliability Engineer (SRE) for Gen AI Systems
>> Bridge the gap between advanced Application Development, cloud infrastructure, and machine learning operations
>> Responsible for building and maintaining the Application and infrastructure that powers large language models (LLMs)
>> Design autonomous, agentic AI workflows to eliminate operational toil and automate incident response
Requirements:
>> Deep expertise in Kubernetes (EKS/GKE), Infrastructure as Code (Terraform), and CI/CD deployment pipelines
>> Strong proficiency in Python or Go, with experience building tool integrations via APIs and Model Context Protocol (MCP)
>> Hands-on experience with LLM orchestration frameworks (e.g., AutoGen, LangChain, LlamaIndex)
>> Experience managing distributed vector databases (e.g., Pinecone, Milvus, Qdrant, or pgvector)
>> Advanced knowledge of cloud monitoring stacks (Datadog, Prometheus, OpenTelemetry) applied to both standard infrastructure and AI workloads
Bachelor’s degree
No related jobs found
← Back to jobs