Description

You will develop and manage big data processing solutions using PySpark.

Responsibilities

  • Implement ETL pipelines and data transformation processes using distributed computing systems.
  • Ensure data quality and integrity across all data processing workflows.
  • Troubleshoot and resolve issues within PySpark applications and workflows.
  • Integrate PySpark code with frameworks like Ingestion Framework and DataLens.
  • Document code lineage and ensure compliance with data security standards.

Required Skills

  • 4+ years of experience in big data development using Hadoop, Hive, and Spark Framework.
  • Strong programming proficiency in Python and PySpark.
  • Expertise in SQL for data manipulation and querying.
  • Experience with Kafka for data streaming.
  • Familiarity with CI/CD Pipelines and DevOps practices.
  • Understanding of data warehousing concepts and relational databases.

Preferred Skills

  • Experience with SAS.
  • Certification in big data or cloud technologies.

Education

Any Graduate