Description
You will design and optimize scalable data pipelines and manage cloud-based data infrastructure.
Responsibilities
- Design, implement, and optimize scalable data pipelines using SQL, Python, and PySpark.
- Integrate machine learning models into production pipelines in collaboration with data scientists.
- Manage and enhance ETL workflows to transform raw data into structured formats.
- Scale data infrastructure using cloud platforms including AWS, Azure, Databricks, and Snowflake.
- Automate ETL processes using Apache Airflow and manage infrastructure as code with Terraform.
- Maintain data governance, quality, and cataloging using Unity Catalog or Hive Metastore.
Required Skills
- 8+ years of experience in data engineering.
- Expertise in SQL and Python for data manipulation and pipeline creation.
- Proficiency in PySpark and hands-on experience with ETL processes.
- Hands-on experience with AWS and Azure cloud platforms.
- Experience with Databricks and Snowflake.
- Experience integrating machine learning and AI into data pipelines.
- Proficiency with Apache Airflow orchestration.
- Experience using Terraform for infrastructure as code.
- Strong knowledge of data warehousing concepts and solutions.