Description
You will build and manage data pipelines handling large volumes of data from flat files, APIs, and streaming sources.
Responsibilities
- Build distributed computing applications using Spark with Java or Scala.
- Develop and maintain real-time data streams using Kafka and Spark Streaming.
- Manage data workflows using Airflow or similar orchestration tools.
- Design and implement data models to support large-scale data processing.
- Process diverse data formats including flat files, APIs, and streaming data.
Required Skills
- 9+ years of experience in data engineering.
- Proficiency in Java or Scala for distributed computing.
- Strong experience with Spark.
- Expertise in SQL.
- Hands-on experience with Kafka and Spark Streaming.
- Experience with workflow management tools like Airflow.
- Proven ability to handle large data volumes from APIs and flat files.
Preferred Skills
- Experience with Google Cloud Platform, specifically BigQuery and DataProc.