Description
Key Skills: Databricks, PySpark, Python, AWS S3, GenAI, Delta Lake, SQL, Data Architecture, Machine Learning, Cloud Computing
Good to Have Skills: Experience with AI BI Genie capabilities, semantic layer design, data governance practices, hybrid work model coordination, enterprise environment experience, compliance and regulatory requirements knowledge, cluster optimization, cost management, and mentoring capabilities.
Roles & Responsibilities:
- Design robust end to end data and analytics architectures that leverage Databricks SQL Databricks Delta Lake and PySpark to deliver scalable reporting and advanced analytics solutions for business teams.
- Develop secure and efficient data ingestion patterns from Amazon S3 and other enterprise sources that ensure reliable data availability optimized storage usage and predictable performance across environments.
- Define best practices for organizing and managing Delta Lake tables including partitioning and optimization strategies to support high performance query workloads and downstream machine intelligence use cases.
- Implement Databricks Workflows to orchestrate complex batch and near real time data processing pipelines that provide timely high quality information to analytics and reporting stakeholders.
- Apply GenAI fundamentals to design solution patterns where foundation models and generative capabilities can augment analytics reporting narratives and self service insights in a safe and governed manner.
- Collaborate with BI and AI product teams using AI BI Genie capabilities to design semantic layers reusable data models and guided analytics experiences that simplify data consumption for business users.
- Create Python and PySpark framework components that standardize logging error handling configuration management and testing so that engineering teams can deliver data pipelines faster and with fewer defects.
- Review solution designs notebooks SQL logic and workflow configurations from engineering teams to ensure consistency with architecture standards security policies and regulatory requirements.
- Partner with information security and platform operations teams to align data architectures with identity management encryption data masking and monitoring controls that protect sensitive information.
- Work with product owners and business stakeholders to translate complex analytical needs into clear solution designs data contracts and service level expectations that can be implemented on the Databricks platform.
- Guide optimization of Databricks clusters job configurations and SQL queries to control cost maximize performance and ensure reliable operation within the hybrid work environment and day shift coverage.
- Document architecture patterns reference implementations and decision records in a structured and accessible way so that engineering teams across the organization can reuse proven designs.
- Mentor data engineers and analytics developers on Databricks PySpark GenAI basics and cloud data design principles to build a strong internal community of practice focused on quality and innovation.
Experience Required: Minimum twelve years of overall experience in data engineering analytics or architecture roles with at least several years focused on cloud based data platforms and modern data stacks. Extensive hands on experience in designing and implementing data solutions using Databricks SQL Databricks Delta Lake Databricks Workflows and PySpark for large scale analytics needs