Design, develop, and maintain scalable, reliable, and high-performance data pipelines using Azure Data Services
Build and optimize ETL/ELT pipelines using GCP and Azure Databricks (PySpark)
Ingest, transform, and process structured and unstructured data from multiple data sources
Implement data models to support analytics, reporting, and machine learning use cases
Ensure data quality, data validation, and governance standards are met
Optimize performance and cost for large-scale data processing workloads
Integrate data pipelines with downstream systems such as Power BI, data science models, and enterprise applications
Support production deployments and troubleshoot data pipeline issues
What You Know
5+ years proven experience in developing and deploying data pipelines, preferably in the GCP Cloud and Azure experience is a plus
4+ years proven experience in building data warehouse platforms in dealing with star schemas and modeling as well as slowly changing dimensions.
4+ years of strong experience with SQL and stored procedures
4+ years of proven expertise in creating pipelines for real time and near real time integration working with different data sources - flat files, XML, JSON, Avro files and databases
4+ years of experience with at least one programming language like Python, Java or Scala
3+ years of Databricks and Delta table knowledge is a plus
Knowledge on Big Data platforms and applications is a plus.
Extensive experience in data transformations for retail business use cases will be a plus
Knowledge for handling exceptions and automated re-processing and reconciling
Passion for Data Quality with an ability to integrate these capabilities into the deliverables
Education
Bachelor s degree in computer science, Information Systems, Engineering, Computer Applications, or related field.