You will build and maintain data quality systems across our data pipelines.
Responsibilities
- Profile and assess data sources to identify anomalies, duplicates, and missing values.
- Design and implement automated data validation frameworks within ETL pipelines.
- Build reusable data quality rules and monitoring systems for real-time issue detection.
- Investigate data discrepancies to determine root causes and drive permanent resolutions with engineering teams.
- Ensure data handling meets regulatory standards, including implementing data masking and lineage tracking.
Required Skills
- Expert-level SQL for complex querying.
- Proficiency in Python for automation scripting.
- Hands-on experience with Snowflake, AWS, or Databricks.
- Knowledge of Spark, Hadoop, or Kafka for large-scale pipeline management.
- Proficiency in data visualization tools like Power BI or Tableau.
- Minimum of 5+ years of relevant experience in Data Engineering or ETL Testing.