Key Skills: Databricks, Cloud (AWS / Azure / GCP), Cost Optimization, Rdbms, Data Engineering, Kafka, Spark, SCALA, Pyspark, AWS components eg Kinesis SQS SNS
Roles and Responsibilities:
- Design, implement, and own a Databricks-style data platform architecture across Dev, UAT, and Prod environments.
- Build end-to-end batch and streaming data pipelines using Spark, Delta Lake/Delta Live Tables concepts, and structured streaming patterns.
- Implement medallion architecture (bronze/silver/gold) and scalable ingestion frameworks from RDBMS, cloud object storage, and streaming sources.
- Optimize Spark job performance (partitioning, caching, joins, shuffle tuning), ensure reliability/HA/fault tolerance, and troubleshoot production issues.
- Lead cluster auto-scaling, job orchestration, and FinOps governance including cost allocation, budgets, auto-termination, and usage monitoring.
- Define and enforce platform best practices for notebooks/code structure, reusable libraries, version control, CI/CD, and monitoring/alerting.
- Provide architectural oversight for lakehouse design, streaming/batch patterns, and adoption of governance capabilities such as Unity Catalog.
- Oversee data governance and security controls (RBAC, row/column-level security, lineage/auditing) and ensure compliance with enterprise standards.
- Enable enterprise AI/ML and GenAI/LLM integrations using platform data, including RAG patterns and responsible AI practices.
Skills Required:
- Strong experience in Databricks and Spark-based lakehouse architectures, including Delta Lake.
- Hands-on expertise in building batch and streaming pipelines using Kafka and Spark.
- Solid understanding of data engineering principles, RDBMS/SQL, and large-scale data processing.
- Experience working with cloud platforms such as AWS, Azure, or GCP for data platform implementation.
- Strong performance tuning and optimization skills for Spark workloads and distributed systems.
- Experience with cost optimization and FinOps practices for cloud data platforms.
- Knowledge of data governance tools such as Unity Catalog and implementation of security controls.
- Familiarity with Scala, PySpark, and cloud-native services like Kinesis, SQS, and SNS is an added advantage.
Education: B.E/ B.Tech/ MCA