You will engineer, develop, and maintain scalable data processing pipelines.
Responsibilities
- Design and develop efficient data processing pipelines using PySpark and Python.
- Build and implement ETL processes to ingest data from disparate sources.
- Optimize PySpark applications and troubleshoot existing code for performance.
- Ensure data integrity and quality across the entire data lifecycle.
- Translate business requirements into technical solutions and participate in architecture discussions.
Required Skills
- 5+ years of professional experience in data engineering.
- Expert proficiency in PySpark.
- Strong programming skills in Python.
- Experience building and maintaining ETL workflows.
- Ability to ingest and load data from various sources.
- Experience optimizing big data processing jobs.