You will design, develop, and maintain data pipelines for ingesting, storing, and processing large datasets.
Responsibilities
- Develop and maintain data pipelines for both batch and real-time processing.
- Implement data analytics pipelines in collaboration with data science teams.
- Process, cleanse, and validate data integrity to support analysis and machine learning algorithms.
- Analyze large data stores to uncover patterns and propose technical solutions to business challenges.
- Document technical and functional specifications and analyze system processing flows.
Required Skills
- 5–7 years of experience in software development and data engineering.
- Expertise in Hadoop and Spark architecture and working principles.
- Hands-on experience with Big Data, Spark, and Hadoop technologies.
- Proficiency in Python, Scala, or Core Java.
- Strong SQL skills, including writing complex queries (Hive/PySpark data frames) and optimizing joins.
- Experience with Informatica and Oracle.
- Solid understanding of Data Warehousing concepts.
- Proficiency in Unix shell scripting.
- Experience in system application analysis, design, development, testing, and implementation.
- Bachelor of Computer Science or equivalent degree.
Preferred Skills
- Knowledge of the Financial reporting ecosystem.