Description
You will design, build, and maintain data pipelines for large-scale data platforms.
Responsibilities
- Design and build ETL/ELT pipelines using AWS-native tools (Glue, Lambda, Step Functions, EMR).
- Manage large-scale AWS S3-based Data Lakes, optimizing storage formats like Parquet/ORC/Delta.
- Develop and optimize Amazon Redshift data models, schemas, and high-performance ELT workloads.
- Integrate data from enterprise systems (CRM, ERP) and third-party APIs, supporting real-time ingestion via Kinesis or Kafka.
- Ensure data quality, lineage, and governance across all data workflows.
Required Skills
- 7+ years of hands-on experience in data engineering.
- Expertise with PySpark/Spark or similar distributed processing frameworks.
- Deep knowledge of Amazon Redshift, including performance tuning and workload management.
- Proficiency with AWS services: S3, Glue, Lambda, EMR, IAM, CloudWatch.
- Strong SQL skills and experience with data modeling (star/snowflake).
- Experience building and managing data lakes and large-scale data pipelines.
- Hands-on experience with streaming technologies such as Kinesis or Kafka.
- Strong understanding of ETL/ELT architecture and data integration patterns.