You will transform raw data into actionable information for data science and product teams.
Responsibilities
Extract data from Hadoop databases using Sqoop and manage Linux-based systems.
Stage real-time data from gateways into AWS S3 or Azure Blob storage.
Implement Spark processing using Scala, Spark SQL API, and DataFrames, optimizing session performance through effective partitioning.
Build ETL pipelines in Azure Data Factory (ADF) using Linked Services and Datasets to move data between Azure SQL, Blob storage, and Azure SQL Data Warehouse.
Develop Spark streaming pipelines using Java.
Required Skills
5+ years of experience in data engineering.
Proficiency in Scala and Java.
Experience with Spark, Spark SQL, and Spark Streaming.
Hands-on experience with Hadoop and Sqoop.
Experience working in Linux environments.
Ability to write and use SQL queries for data analysis and transformation.
Experience with Azure SQL, Blob storage, and Azure SQL Data Warehouse.
Experience building pipelines in Azure Data Factory (ADF).
Preferred Skills
Willingness to travel or relocate to unanticipated client sites.