Deploy and configure Apache Spark clusters on AWS (particularly EKS), optimizing for performance, resource allocation, and scalability.
Develop, automate, and support robust and reliable Spark data pipelines, focusing on high performance and low latency.
Design and implement highly optimized queries to improve data processing efficiency, streamline analysis, minimize latency, and enhance overall system performance.
Collaborate to make the Data Platform an effective, scalable, and resilient system.
Gather requirements from stakeholders, prioritize work, and document technical solutions clearly and effectively.
Engage with data engineers, data scientists, product managers, and internal stakeholders to align project goals and implementations.
Contribute to a team culture that values quality, robustness, and scalability while fostering initiatives and innovation
Minimum Qualifications: 5+ years of data processing experience in large cloud-based infrastructure (AWS is must)
Hands-on software development experience in Python, with strong proficiency in Py Spark for data engineering tasks and data pipeline development
Expert understanding of SQL, dimensional modeling, and analytical data warehouses, such as Snowflake
Understanding of Data Engineering best practices for medium to large scale production workloads
Expertise with data pipeline orchestration tools, such as Airflow Familiar with processing semi-structured file formats such as json or parquet
Team player with good communication skills Problem solver with excellent written and interpersonal skills Strong problem-solving skills Ability to learn new technical skills
Bachelor’s degree in Computer Science, Data Science, or related fields Preferred Qualifications:
Experience with Jinja, Shell scripting, DBT Developing in Cloud platform using serverless technologies such as glue, lambda, EMR is a plus.
Experience with remote development using AWS SDK is a plus
Experience with ELT pipelines - DBT REST API design and implementation Familiarity with containers and infrastructure-as-code principles
Experience with automation frameworks - Git, Jenkins, and Terraform.