As a Data Engineer, you will: Data Ingestion & Pipeline Development Build and enhance ingestion pipelines for large batch and event-driven paths (streaming may evolve over time).
Integrate data from: Third party enrichment vendors (identity + attributes, very large volumes) Digital platforms via Conversion API (CAPI) integrations (through intermediary/middleware) Rewards/Promotions systems (e.g., TMT) for offer issuance/redemption/consumption data
Data Quality, Reliability & Operations Implement strong data validation, idempotency, replay/backfill strategies, and deduplication to prevent quality drift.
Own monitoring, alerting, dashboarding, and operational readiness ( wrappers around core pipelines).
Troubleshoot failures with root cause analysis not just reruns: Interpret Spark logs Diagnose performance issues (shuffle, skew, partitioning) Improve stability and SLA adherence Governance & Compliance (First-class NFR) Apply privacy, compliance, and governance requirements across pipelines and datasets.
Support governance standards such as: Unity Catalog, lineage, access controls Managing PII vs non PII access Documentation of tables, schemas, catalogs, and cluster usage
Cost Governance & Performance Optimization Design pipelines with cost awareness from day one: Cluster sizing, workload tuning, efficient compute/storage usage Trade-off decisions balancing cost vs quality vs SLA Collaboration & Ownership Work in a small, fast-moving team; be self-driven and ownership-oriented.
Raise and manage data quality escalations when issues are detected.
Contribute to evolving architecture (product is early-stage; first live month was recent).
Must-Have Skills (Screening Keywords)
Candidate with hands-on, recent experience in: Strong coding: PySpark + SQL (hands-on, not only orchestration)
Snowflake: data modeling/usage for analytics/warehousing workloads Azure ecosystem: Azure Data Factory (ADF) (orchestration) Azure-native integrations and services exposure
Data engineering reliability patterns: validation, idempotency, replay/backfills, dedup, auditability Data governance: Unity Catalog (preferred), lineage, access control patterns, PII handling Ownership mindset: can execute independently without constant approvals/check-ins
Nice-to-Have Skills
Event-driven/streaming ingestion exposure (even if primary is batch today)
Delta/Databricks patterns such as Delta Live Tables (DLT) (some workflows exist)
Experience building config-driven export frameworks for multiple downstream consumers/vendors
Exposure/interest in identity resolution concepts (ML optional; ETL strength is priority)
Familiarity with CAPI integrations / marketing tech data signals