Description

You will build and manage scalable data pipelines and analytical databases within an AWS environment.

Responsibilities

  • Design and implement batch and streaming data pipelines using Python, PySpark, and SQL.
  • Develop and maintain efficient data models and high-performing analytical databases in Redshift or Snowflake.
  • Administer databases through schema design, performance optimization, and continuous query tuning.
  • Automate deployment and scaling of pipelines using infrastructure as code tools like Terraform or CloudFormation.
  • Establish monitoring and logging mechanisms to ensure pipeline reliability and data quality.

Required Skills

  • 5+ years of hands-on data engineering experience.
  • Strong proficiency in Python and SQL.
  • Extensive experience with PySpark for data processing.
  • Hands-on expertise with AWS Glue and AWS Step Functions.
  • Experience managing data via AWS DMS and Amazon S3.
  • Proficiency with Amazon Redshift and Amazon RDS.
  • Experience using AWS Lambda and Amazon EMR.
  • Knowledge of AWS IAM for secure access management.
  • Ability to implement data validation, error handling, and compliance standards.

Education

Any Gradute