Description

Key Skills: Databricks, Cloud (AWS / Azure / GCP), Cost Optimization, Rdbms, Data Engineering, Kafka, Spark, SCALA, Pyspark, AWS components eg Kinesis SQS SNS

Roles and Responsibilities:

  • Design, implement, and own a Databricks-style data platform architecture across Dev, UAT, and Prod environments.
  • Build end-to-end batch and streaming data pipelines using Spark, Delta Lake/Delta Live Tables concepts, and structured streaming patterns.
  • Implement medallion architecture (bronze/silver/gold) and scalable ingestion frameworks from RDBMS, cloud object storage, and streaming sources.
  • Optimize Spark job performance (partitioning, caching, joins, shuffle tuning), ensure reliability/HA/fault tolerance, and troubleshoot production issues.
  • Lead cluster auto-scaling, job orchestration, and FinOps governance including cost allocation, budgets, auto-termination, and usage monitoring.
  • Define and enforce platform best practices for notebooks/code structure, reusable libraries, version control, CI/CD, and monitoring/alerting.
  • Provide architectural oversight for lakehouse design, streaming/batch patterns, and adoption of governance capabilities such as Unity Catalog.
  • Oversee data governance and security controls (RBAC, row/column-level security, lineage/auditing) and ensure compliance with enterprise standards.
  • Enable enterprise AI/ML and GenAI/LLM integrations using platform data, including RAG patterns and responsible AI practices.

Skills Required:

  • Strong experience in Databricks and Spark-based lakehouse architectures, including Delta Lake.
  • Hands-on expertise in building batch and streaming pipelines using Kafka and Spark.
  • Solid understanding of data engineering principles, RDBMS/SQL, and large-scale data processing.
  • Experience working with cloud platforms such as AWS, Azure, or GCP for data platform implementation.
  • Strong performance tuning and optimization skills for Spark workloads and distributed systems.
  • Experience with cost optimization and FinOps practices for cloud data platforms.
  • Knowledge of data governance tools such as Unity Catalog and implementation of security controls.
  • Familiarity with Scala, PySpark, and cloud-native services like Kinesis, SQS, and SNS is an added advantage.

Education: B.E/ B.Tech/ MCA

Education

Any Graduate