You will own the design and implementation of automated, end-to-end data lineage solutions across a heterogeneous enterprise data ecosystem.
Responsibilities
Lead the implementation of automated data lineage across cloud platforms (Snowflake, AWS), legacy databases, ETLs, and BI platforms (Tableau, Power BI).
Implement or extend lineage frameworks like Spline or OpenLineage to capture active lineage.
Build connectors, extractors, or agents to bridge gaps between systems and lineage frameworks.
Integrate lineage capture with metadata platforms to publish lineage in a consumable format.
Apply AI/ML techniques to infer lineage where automation is incomplete, using logs or query patterns.
Required Skills
5+ years of experience delivering automated data lineage solutions across hybrid architectures.
Hands-on expertise with Spline, OpenLineage, or Marquez.
Strong background in metadata capture, ETL process tracing, and query execution mapping.
Deep programming skills in Python, Scala, or Java.
Deep familiarity with SQL and query logs from Snowflake, SQL Server, Oracle, and MongoDb.
Proven experience integrating lineage with data governance tools.
Strong AI/ML foundation, particularly in metadata intelligence or pattern detection.
Experience operating at the intersection of data engineering, metadata management, and AI.