← Back to jobs
O'Fallon, MO, USA
No related jobs found
Key Responsibilities
• Design, develop, and maintain large scale Spark applications using Scala and PySpark
• Build and operate streaming heavy data pipelines using Kafka and Spark Structured Streaming
• Implement stateful streaming patterns including windowing, watermarking, late data handling, and checkpointing
• Develop robust event replay and reprocessing workflows using Kafka offsets and partitions
• Build ingestion and routing flows using Apache NiFi, including Kafka based ingestion patterns
• Implement end to end ETL/ELT pipelines with strong emphasis on low latency, fault tolerance, and scalability
• Optimize Spark jobs through partitioning strategies, memory tuning, shuffle optimization, and efficient data formats
• Integrate Spark workloads with distributed object storage systems such as Apache Ozone and Ceph
• Ensure data quality, consistency, and auditability through validation, reconciliation, and metadata capture
• Collaborate with platform, infrastructure, and operations teams on production readiness and capacity planning
• Support production systems, including monitoring, incident analysis, and root cause resolution
• Contribute to reusable frameworks, coding standards, and engineering best practices
• Participate in architecture reviews, code reviews, and technical documentation
• Strong hands on experience with Apache Spark in production environments
• Advanced proficiency in Scala and PySpark
• Solid understanding of distributed systems and data processing at scale
• Strong experience with Kafka based streaming architectures
• Hands on experience with Spark Structured Streaming
• Experience building batch and real time pipelines
Bachelor's degree
No related jobs found
← Back to jobs