Minimum 4+ years of experience in designing, developing, and maintaining scalable data pipelines, ETL/ELT workflows, and enterprise data integration solutions.
Expertise in Python, SQL, PySpark, Spark SQL, Scala, and distributed data processing frameworks for handling large-scale datasets.
Experience with big data technologies including Apache Spark, Databricks, Hadoop, Airflow, Kafka, and Snowflake for modern data engineering workloads.
Experience building cloud-native data platforms using AWS, Azure, or GCP, with a strong understanding of scalable and highly available data architectures.
Working knowledge of cloud services such as AWS S3, Glue, Redshift, Athena, EMR, Lambda, Kinesis, or Azure Data Factory, Synapse Analytics, ADLS, Databricks, and Event Hubs.
Experience designing and implementing data warehouses, dimensional models, star schemas, snowflake schemas, data lakes, and lakehouse architectures.
Experience building and optimizing batch processing and real-time streaming data pipelines using technologies such as Kafka, Spark Streaming, Flink, or Kinesis.
Experience handling structured, semi-structured, and unstructured data using file formats including Parquet, Avro, ORC, CSV, and JSON.
Experience working with relational and NoSQL databases such as PostgreSQL, MySQL, Oracle, MongoDB, Cassandra, and DynamoDB, including query optimization and performance tuning.
Familiarity with CI/CD pipelines, DevOps practices, infrastructure automation, and version control systems using Git, Jenkins, GitHub Actions, Azure DevOps, or GitLab CI/CD.
Understanding of data quality, data governance, data security, monitoring, observability, partitioning strategies, and troubleshooting distributed systems