You will build and maintain complex data pipelines to deliver accurate and timely data across various products.
Responsibilities
Build data pipelines according to transformation specifications to load source data into the data lake using the proprietary big data processing platform.
Support and improve current data ingestion processes for proprietary healthcare data applications and systems.
Develop and maintain data engineering processes using T-SQL, Spark, Scala, and shell scripting, focusing on ingestion, validation, and report generation.
Review and test data to ensure accuracy and validity before uploading to the data lake.
Troubleshoot data issues, perform data analysis, and identify root causes across the environment.
Required Skills
5+ years experience with data aggregation, standardization, linking, quality check mechanisms, and reporting.
5+ years experience with big data technologies, specifically Hadoop and Spark.
5+ years experience with RDBMS (Oracle, MS SQL Server) using SQL or other ETL tools.
Solid understanding of Linux environments, including shell scripting and file systems.