Description
You will design, build, and deploy production-level data pipelines using Apache Beam (Dataflow) and Java. You will handle both batch and stream processing needs, ensuring scalability and reliability for high-volume data.
Responsibilities
- Develop reusable Flex templates and data processing frameworks using Java for batch and streaming workloads.
- Implement event-driven architecture using Kafka, Pub/Sub, and Confluent Kafka for real-time data streaming.
- Configure and maintain Kafka Connect frameworks, utilizing connectors such as HTTP REST, JMS, File, SFTP, and JDBC.
- Manage large volumes of streaming messages and ensure efficient processing within the Google Cloud Platform.
- Leverage Google Cloud services including BigQuery, Cloud SQL, BigTable, Compute Engine, Cloud Functions, Cloud Run, and Cloud Storage.
Required Skills
- 6+ years of experience in software engineering or data engineering.
- Strong proficiency in Java for building distributed data processing applications.
- Deep understanding of Apache Beam and Google Cloud Dataflow.
- Extensive experience with Apache Kafka and Confluent Kafka, including real-time streaming and event-driven design.
- Hands-on experience with Google Cloud Platform (GCP) services, specifically BigQuery, Cloud SQL, and BigTable.
- Knowledge of open-source distributed storage and processing utilities in the Apache Hadoop family.
Preferred Skills
- Experience designing and deploying production-level data pipelines at scale.
- Familiarity with Google Cloud SQL and underlying product technologies.