You will build and manage scalable data pipelines and analytical databases within an AWS environment.
Responsibilities
- Design and implement batch and streaming data pipelines using Python, PySpark, and SQL.
- Develop and maintain efficient data models and high-performing analytical databases in Redshift or Snowflake.
- Administer databases through schema design, performance optimization, and continuous query tuning.
- Automate deployment and scaling of pipelines using infrastructure as code tools like Terraform or CloudFormation.
- Establish monitoring and logging mechanisms to ensure pipeline reliability and data quality.
Required Skills
- 5+ years of hands-on data engineering experience.
- Strong proficiency in Python and SQL.
- Extensive experience with PySpark for data processing.
- Hands-on expertise with AWS Glue and AWS Step Functions.
- Experience managing data via AWS DMS and Amazon S3.
- Proficiency with Amazon Redshift and Amazon RDS.
- Experience using AWS Lambda and Amazon EMR.
- Knowledge of AWS IAM for secure access management.
- Ability to implement data validation, error handling, and compliance standards.