Description
You will design, build, and maintain data pipelines and data lake infrastructure.
Responsibilities
- Design, build, and maintain ETL/ELT data pipelines using AWS-native tools (Glue, Lambda, Step Functions, EMR).
- Develop scalable data ingestion solutions from structured and unstructured sources.
- Build data transformation workflows using PySpark, Spark, or PySpark, Spark-based frameworks.
- Manage large-scale AWS S3-based Data Lakes, implementing partitioning and cataloging via AWS Glue Data Catalog.
- Optimize Amazon Redshift data models and build high-performance ELT workloads using SQL, Spectrum, and COPY commands.
Required Skills
- 7+ years of hands-on experience in data engineering.
- Expertise with S3, Glue, Redshift, Lambda, IAM, CloudWatch, and EMR.
- Deep knowledge of Amazon Redshift, including optimization and performance tuning.
- Strong SQL skills and experience with data modeling (star/snowflake).
- Experience building and managing data lakes and large-scale data pipelines.
- Hands-on experience with PySpark/Spark or similar distributed processing frameworks.
- Experience with streaming technologies such as Kinesis or Kafka.
- Strong understanding of ETL/ELT architecture and data integration patterns.