Description
Responsibilities
- Design, develop, and maintain scalable data pipelines and workflows using GCP services
- Build and optimize data models, schemas, and warehouses (BigQuery, Cloud SQL, etc.) aligned with business needs
- Collaborate with data analysts, data scientists, and business stakeholders to gather requirements and deliver effective solutions
- Ensure data quality, security, and compliance across all data environments
- Automate data workflows, monitor system health, and troubleshoot issues proactively
- Optimize infrastructure for performance and cost-efficiency within GCP
- Document architecture, processes, and best practices for data engineering solutions
- Stay current with GCP advancements and incorporate new features and tools for continuous improvement
Software Requirements
- Proficiency in SQL and programming languages such as Python, Java, or Scala for data processing
- Hands-on experience with GCP data services including BigQuery, Cloud Dataflow, Cloud Data Fusion, Cloud Storage, and Dataproc
- Knowledge of ETL/ELT processes, data modeling, and data warehousing concepts
- Experience designing and implementing scalable and secure data pipelines on GCP
- Familiarity with containerization (Docker) and orchestration (GKE, Kubernetes)
- Understanding of data security, encryption, and compliance policies in cloud environments
- Experience with Apache Spark, Kafka, or Pub/Sub for real-time data streaming
- Knowledge of machine learning workflows on GCP (Vertex AI, AI Platform)
- Exposure to Cloud Identity & Access Management (IAM) and cybersecurity best practices
Category-wise Technical Skills
Data Processing & Frameworks:
- GCP data services: BigQuery, Dataflow, Dataproc, Data Fusion, Cloud Storage
- Programming languages: Python, Java, Scala
- Real-time streaming: Pub/Sub, Kafka (preferred)
- Batch processing frameworks: Apache Spark, Dataflow
Cloud & Infrastructure:
- GCP Identity & Access Management (IAM), VPC, Cloud Networking
- Deployment automation: Terraform, Deployment Manager
- Container orchestration: Google Kubernetes Engine (GKE)
Data Modeling & Warehousing:
- Designing schemas and aggregations for analytical processing
- Data governance, access controls, lineage, and audit trails
Tools & Monitoring:
- Stackdriver, Data Studio, Looker, or Power BI for visualization and monitoring
- Version control: Git, CI/CD integrations
Experience
- 5+ years of experience in data engineering, data architecture, or related roles
- Proven track record deploying large-scale data pipelines on GCP
- Hands-on experience with enterprise data solutions, data lakes, or data warehouses
- Demonstrated ability to work collaboratively with cross-functional teams
- Experience with real-time data integration and processing workflows preferred
Day-to-Day Activities
- Design, develop, and maintain scalable data pipelines and workflows on GCP
- Implement, optimize, and troubleshoot batch and streaming data processes
- Collaborate with data analysts, scientists, and stakeholders to understand data requirements
- Monitor workflows, troubleshoot data pipeline issues, and implement improvements
- Ensure data security, privacy, and compliance across all environments
- Automate deployment and management of data infrastructure using DevOps tools
- Document technical architecture, processes, and standards
- Keep abreast of GCP innovations and incorporate relevant new features into solutions
Qualifications & Soft Skills
Qualifications:
- Bachelor’s degree in Computer Science, Data Science, Engineering, or related field
- 5+ years of experience in data engineering, big data, or cloud data solutions
- Relevant certifications such as Google Professional Data Engineer are advantageous
Soft Skills:
- Strong analytical, problem-solving, and organizational skills
- Effective communication skills for collaborating with technical and non-technical teams
- Ability to work independently and as part of a team
- Detail-oriented, with a focus on quality and accuracy
- Adaptability and eagerness to learn emerging cloud and data technologies
- Proactive attitude with a focus on continuous improvemen