Description
You will build and maintain scalable ETL/ELT pipelines using Apache Spark and Snowflake.
Responsibilities
- Design and develop data processing, transformation, and analytics workflows in distributed environments using Databricks.
- Implement data warehousing solutions in Snowflake, focusing on performance, cost, and security.
- Write efficient SQL and Spark applications to process large-scale datasets.
- Integrate various data sources including cloud storage, APIs, and RDBMS.
- Ensure data quality and integrity through unit testing, validation, and monitoring.
- Optimize and troubleshoot Spark jobs, SQL queries, and Snowflake workflows.
Required Skills
- 3+ years of experience with Snowflake, including schema design, query optimization, and Snowpipe, Streams, or Tasks.
- 2+ years of hands-on development with Apache Spark using PySpark, Scala, or Java.
- Proficiency in SQL and Spark (RDD/DataFrame APIs).
- Experience with cloud platforms such as AWS, Azure, or GCP.
- SnowPro Core or Advanced Certification (e.g., Architect, Data Engineer).
- Databricks Certified Associate Developer for Apache Spark.
- Familiarity with data modeling, data quality, and orchestration tools like Airflow or DBT.
- Knowledge of CI/CD pipelines and version control using Git or GitHub Actions.
- Understanding of distributed computing, data lakes, and modern data architectures.
Preferred Skills
- Experience with Prefect for orchestration.