Description
You will design, develop, and maintain scalable ETL/ELT pipelines.
Responsibilities
- Design, develop, and maintain scalable ETL/ELT pipelines using Scala and Apache Spark/Flink (Batch & Streaming).
- Optimize Spark jobs and SQL queries for performance, efficiency, and cost.
- Implement and manage Lakehouse architectures using Apache Iceberg, Hudi, or Delta Lake.
- Apply Medallion Architecture (Bronze/Silver/Gold) for analytics and ML readiness.
- Enable data observability for freshness, lineage, and reliability.
Required Skills
- Strong proficiency in Scala and Apache Spark (Batch & Streaming).
- Solid understanding of SQL and distributed computing concepts.
- Experience with GCP (Dataproc, GCS, BigQuery) or equivalent cloud platforms (AWS/Azure).
- Hands-on experience with Docker and Kubernetes.
- Experience with Lakehouse table formats (Iceberg, Hudi, Delta).
- Familiarity with CI/CD practices.
- Experience as a Data Engineer or Big Data Engineer (1–5 years).
- Bachelor's degree required.
Preferred Skills
- Experience building data pipelines for ML / feature engineering.
- Exposure to workflow orchestration tools (Airflow, Azkaban).