Description
Manage mission-critical Postgres infrastructure and automate operational tasks to ensure high availability and performance.
Responsibilities
- Provide 24x7 support for distributed Postgres databases, including weekend rotational on-call duties.
- Perform deep-dive performance tuning and SQL optimization to resolve user issues.
- Automate operational tasks and establish best practices for L1/L2 global teams.
- Plan and test disaster recovery procedures, including backup/restore workflows.
- Triage and resolve issues across application, network, database, and storage tiers.
Required Skills
- 8+ years of experience as a Database Administrator or Site Reliability Engineer.
- Strong hands-on expertise with PostgreSQL V12+, including HADR/replication cluster configuration.
- Proficiency in Linux/Unix OS fundamentals, kernel tuning, and security.
- Scripting skills in Python, Shell, or Perl for automation.
- Experience with automation tools such as Ansible.
- Ability to diagnose and resolve complex performance bottlenecks.
- Strong communication skills for cross-functional collaboration.
Preferred Skills
- Experience with Snowflake design patterns and migration workflows.
- Familiarity with Azure or AWS cloud platforms and Identity Management.
- Experience managing geo-redundant databases using Patroni or GoldenGate.