Description

Must have : Databricks , PySpark , Azure Cloud Services

Technical Skills:

  • Strong expertise in Databricks (Delta Lake, Unity Catalog, Lakehouse Architecture, Table Triggers, Delta Live Pipelines, Databricks Runtime etc.)
  • Proficiency in Azure Cloud Services.
  • Solid Understanding of Spark and PySpark for big data processing.
  • Experience in relational databases.
  • Knowledge on Databricks Asset Bundles and GitLab.


Key Responsibilities:

  1. Data Pipeline Development:
    • Build and maintain scalable ETL/ELT pipelines using Databricks.
    • Leverage PySpark/Spark and SQL to transform and process large datasets.
    • Integrate data from multiple sources including Azure Blob Storage, ADLS and other relational/non-relational systems.
  2. Collaboration & Analysis:
    • Work Closely with multiple teams to prepare data for dashboard and BI Tools.
    • Collaborate with cross-functional teams to understand business requirements and deliver tailored data solutions.
  3. Performance & Optimization:
    • Optimize Databricks workloads for cost efficiency and performance.
    • Monitor and troubleshoot data pipelines to ensure reliability and accuracy.
  4. Governance & Security:
    • Implement and manage data security, access controls and governance standards using Unity Catalog.
    • Ensure compliance with organizational and regulatory data policies.
  5. Deployment:
    • Leverage Databricks Asset Bundles for seamless deployment of Databricks jobs, notebooks and configurations across environments.
    • Manage version control for Databricks artifacts and collaborate with team to maintain development best practices.


Preferred Experience:

  • Familiarity with Databricks Runtimes and advanced configurations.
  • Knowledge of streaming frameworks like Spark Streaming.
  • Experience in developing real-time data solutions.


Certifications:

  • Azure Data Engineer Associate or Databricks certified Data Engineer Associate certification. (Optional)


 

Education

Any Graduate