Description
You will build and optimize data pipelines for an internal observability and analytics platform.
Responsibilities
- Design and build scalable batch and real-time data pipelines across structured and unstructured sources.
- Integrate analytics and observability services with upstream annotation tools and downstream ML validation systems.
- Collaborate with product, platform, and analytics teams to define event models, metrics, and data contracts.
- Develop ETL/ELT workflows using tools like AWS Glue, PySpark, or Airflow; ensure data quality, lineage, and reconciliation.
- Implement observability pipelines and alerts for mission-critical metrics.
Required Skills
- 3–8 years of experience in data engineering or backend development in data-intensive environments.
- Proficiency in Python and SQL; familiarity with PySpark or other distributed processing frameworks.
- Strong experience with cloud-native data tools and services (S3, Lambda, Glue, Kinesis, Firehose, RDS).
- Familiarity with frameworks like Apache Hadoop, Apache Spark, and related tools for handling large datasets.
- Experience with data lake and warehouse patterns (e.g., Delta Lake, Redshift, Snowflake).
- Solid understanding of data modeling, schema design, and versioned datasets.
- Working knowledge of messaging frameworks like Kafka, Firehose.
- Good working knowledge of Database fundamentals, relational databases and SQL.
Preferred Skills
- Experience with observability/monitoring systems (e.g., Prometheus, Grafana, OpenTelemetry).
- Familiarity with data governance, RBAC, PII redaction, or compliance in analytics platforms.