You will design and implement data storage and distributed computing solutions within AWS environments.
Responsibilities
- Build and deploy applications using AWS services including EC2, S3, Hive, Glue, EMR, RDS, ELB, and Lambda.
- Implement and tune Hadoop and Spark architectures for performance optimization.
- Develop automated unit, integration, regression, performance, and acceptance tests.
- Design data solutions covering data warehousing, data lake patterns, and both structured and unstructured data.
- Apply software design principles to maintain scalable data pipelines.
Required Skills
- 5+ years of professional development experience in Java and Python.
- 3+ years of hands-on experience with Hadoop/Spark implementation and performance tuning.
- 3+ years of experience in Data Storage and Hadoop platform implementation.
- Experience implementing AWS services in distributed computing and enterprise environments.
- Deep understanding of the Hadoop ecosystem, including Sqoop, Flume, Kafka, Oozie, Hue, Zookeeper, HCatalog, Solr, and Avro.
- Proficiency in ETL processes and working with various database types.
- Bachelor’s degree in statistics, data science, or a related field, or equivalent work experience.
Preferred Skills
- Experience with Apache Hadoop ecosystem tools.
- Direct experience with Amazon Elastic Map Reduce (EMR) tuning.