You will design and maintain data infrastructure using Azure services and Databricks to support batch and real-time processing.
Responsibilities
- Build and optimize ETL pipelines for ingesting, transforming, and loading data into cloud data warehouses.
- Implement data quality checks and automation to reduce errors and ensure reliable reporting.
- Develop Python scripts to interact with REST APIs and streamline data acquisition workflows.
- Containerize applications using Docker to improve deployment efficiency and scalability.
- Orchestrate end-to-end data workflows using Apache Airflow and PySpark.
Required Skills
- 5+ years of experience in data engineering.
- Expertise in Databricks, Azure Data Lake Storage, and Azure Data Factory.
- Strong proficiency in Python for data processing and REST API integration.
- Hands-on experience with Docker for containerization.
- Experience with Apache Airflow for data orchestration.
- Ability to tune performance for large datasets and distributed systems.
- Any Graduate degree.