You will own the end-to-end data engineering lifecycle, from pipeline infrastructure to model productionization and business analytics.
Responsibilities
Build and scale ETL/ELT pipelines using Apache Spark, Airflow, and Kafka to support AI and ML workloads.
Transform unstructured data (text, images, video) into structured datasets for model training, including feature engineering and vector database ingestion.
Deploy ML models, create APIs, and implement MLOps practices with monitoring for data drift in partnership with data scientists.
Create dashboards in Tableau and run SQL queries to deliver actionable business insights.
Ensure data quality, reliability, and security (PII/PHI) within AI systems, maintaining GDPR compliance.
Required Skills
Expert-level Python and advanced SQL.
Hands-on experience with Apache Spark, Kafka, Airflow, and Databricks.
Experience with ETL processes and cloud data management (AWS Glue, S3, Redshift).
Proficiency with Tableau for data visualization.
Knowledge of PyTorch for data manipulation and model interaction.
Understanding of GDPR regulations regarding data governance.
10+ years of experience in data engineering or analytics.
Any Graduate degree.
Preferred Skills
Experience with Java for large-scale distributed systems.
Familiarity with vector databases (Pinecone, Milvus) and GCP services.