You will develop and manage big data processing solutions using PySpark.
Responsibilities
- Implement ETL pipelines and data transformation processes using distributed computing systems.
- Ensure data quality and integrity across all data processing workflows.
- Troubleshoot and resolve issues within PySpark applications and workflows.
- Integrate PySpark code with frameworks like Ingestion Framework and DataLens.
- Document code lineage and ensure compliance with data security standards.
Required Skills
- 4+ years of experience in big data development using Hadoop, Hive, and Spark Framework.
- Strong programming proficiency in Python and PySpark.
- Expertise in SQL for data manipulation and querying.
- Experience with Kafka for data streaming.
- Familiarity with CI/CD Pipelines and DevOps practices.
- Understanding of data warehousing concepts and relational databases.
Preferred Skills
- Experience with SAS.
- Certification in big data or cloud technologies.