You will design and implement data processing frameworks and pipelines within the Google Cloud Platform environment. You will build high-performing systems for batch and real-time streams, managing ingestion, transformation, and aggregation.
Responsibilities
- Build and maintain ETL pipelines for data collection, storage, and curation.
- Develop streaming and batch processing frameworks using GCP services.
- Schedule complex workflows and orchestrate jobs using Airflow or Cloud Composer.
- Automate deployments and testing through CI/CD pipelines.
- Manage data ingestion and processing using industry-standard technology stacks.
Required Skills
- 10+ years of application development experience.
- Expertise in GCP services: BigQuery, Dataflow, Cloud Storage, DataProc, Composer, Pub/Sub, and Cloud Monitoring.
- Proficiency in Java and Python.
- Experience with streaming ETL using Apache Beam and Kafka.
- Strong SQL background with Teradata, BigQuery, or BigTable.
- Hands-on experience with Airflow or Cloud Composer.
- Proficiency in Bash shell scripting, UNIX utilities, and commands.
- Experience implementing CI/CD automation pipelines.
- Experience using JIRA or similar project management tools.
Preferred Skills
- Knowledge of Kubernetes, Docker, Spark, PySpark, or Kafka.
- Experience with Scrum/Agile methodologies, data mapping, and JSON data manipulation.