Description

Responsibilities:

  • Deploy and configure Apache Spark clusters on AWS (particularly EKS), optimizing for performance, resource allocation, and scalability.
  •  
  • Develop, automate, and support robust and reliable Spark data pipelines, focusing on high performance and low latency.
  •  
  • Design and implement highly optimized queries to improve data processing efficiency, streamline analysis, minimize latency, and enhance overall system performance.
  •  
  • Collaborate to make the Data Platform an effective, scalable, and resilient system.
  •  
  • Gather requirements from stakeholders, prioritize work, and document technical solutions clearly and effectively.
  •  
  • Engage with data engineers, data scientists, product managers, and internal stakeholders to align project goals and implementations.
  •  
  • Contribute to a team culture that values quality, robustness, and scalability while fostering initiatives and innovation
  •  
  • Minimum Qualifications: 5+ years of data processing experience in large cloud-based infrastructure (AWS is must)
  •  
  • Hands-on software development experience in Python, with strong proficiency in Py Spark for data engineering tasks and data pipeline development
  •  
  • Expert understanding of SQL, dimensional modeling, and analytical data warehouses, such as Snowflake
  • Understanding of Data Engineering best practices for medium to large scale production workloads
  • Expertise with data pipeline orchestration tools, such as Airflow Familiar with processing semi-structured file formats such as json or parquet
  • Team player with good communication skills Problem solver with excellent written and interpersonal skills Strong problem-solving skills Ability to learn new technical skills
  • Bachelor’s degree in Computer Science, Data Science, or related fields  Preferred Qualifications:
  • Experience with Jinja, Shell scripting, DBT Developing in Cloud platform using serverless technologies such as glue, lambda, EMR is a plus.
  • Experience with remote development using AWS SDK is a plus
  • Experience with ELT pipelines - DBT REST API design and implementation Familiarity with containers and infrastructure-as-code principles
  • Experience with automation frameworks - Git, Jenkins, and Terraform.


 

Education

Any Graduate