← Back to jobs
Denver, CO, USA
No related jobs found
- Own features from design -> development -> testing -> deployment -> production support
- Take accountability for the reliability and performance of your services in production
- Drive technical decisions without waiting to be told what to do
System Design & Architecture
- Design scalable, resilient distributed systems handling millions of daily transactions
- Make pragmatic trade-off decisions (consistency vs. availability, complexity vs. speed)
- Produce clear technical design documents and lead design reviews
- Evaluate build vs. buy decisions with data
Development
- Strong proficiency in Java and/or Scala or TypeScript and Node.js
- Build high-throughput, low-latency microservices
- Use frameworks like NestJS, Express, or Fastify (TypeScript/Node.js - specific)
- Write clean, maintainable, type-safe code - but more importantly, know when and what to build
Testing & Quality
- Write meaningful unit, integration, and contract tests - not just for coverage metrics
- Own E2E test strategy for your services
- Build confidence in deployments through automated validation
Infrastructure & Cloud (AWS)
- Hands-on with AWS services (ECS/EKS, Lambda, S3, SQS, CloudWatch, IAM)
- Understand networking, security, and cost implications of architectural choices
- Comfortable with Infrastructure as Code (Terraform/CloudFormation)
Data & Messaging
- Kafka - design event-driven architectures, manage topics, handle consumer lag and rebalancing
- Redis - caching strategies, pub/sub, cluster management
- MongoDB/DocumentDB - schema design, indexing, query optimization, aggregation pipelines
- DynamoDB - single-table design, GSI/LSI strategies, capacity planning, streams (TypeScript/Node.js - specific)
Monitoring & Observability
- Datadog - build dashboards, set meaningful alerts, track SLOs, analyze APM traces
- Splunk - log analysis, search queries, correlation across services for incident investigation
- Understand distributed tracing, structured logging, and metric-driven decision making
SRE Mindset
- Analyze production incidents: read logs, trace requests, identify root cause under pressure
- Think about failure modes before they happen (circuit breakers, retries, fallbacks, graceful degradation)
- Participate in on-call rotations and drive blameless post-mortems
- Proactively identify capacity risks and performance bottlenecks
Proactive Analysis & Innovation
- Identify risks, tech debt, and performance bottlenecks before they become incidents
- Propose and drive improvements - don't wait for a ticket
- Stay current with industry trends and bring relevant ideas to the team
- Challenge existing patterns when they no longer serve the system
Bachelor's degree
No related jobs found
← Back to jobs