Description
You will build and optimize scalable data processing applications using Python and PySpark.
Responsibilities
- Develop, maintain, and optimize scalable data processing applications.
- Design and implement data solutions that meet performance and reliability requirements.
- Monitor and troubleshoot performance issues within data processing pipelines.
- Implement and maintain CI/CD pipelines for automated testing and deployment.
Required Skills
- 8+ years of experience in Python development.
- Expertise in PySpark and Apache Spark, including Spark SQL, DataFrames, and Spark Streaming.
- Proficiency in Python libraries such as Pandas and NumPy.
- Strong SQL skills and experience with relational databases.
- Experience with version control using Git.
- Hands-on experience with CI/CD pipelines.
- Familiarity with big data tools and frameworks.
Preferred Skills
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.