Description

You will design and implement data storage and distributed computing solutions within AWS environments.

Responsibilities

  • Build and deploy applications using AWS services including EC2, S3, Hive, Glue, EMR, RDS, ELB, and Lambda.
  • Implement and tune Hadoop and Spark architectures for performance optimization.
  • Develop automated unit, integration, regression, performance, and acceptance tests.
  • Design data solutions covering data warehousing, data lake patterns, and both structured and unstructured data.
  • Apply software design principles to maintain scalable data pipelines.

Required Skills

  • 5+ years of professional development experience in Java and Python.
  • 3+ years of hands-on experience with Hadoop/Spark implementation and performance tuning.
  • 3+ years of experience in Data Storage and Hadoop platform implementation.
  • Experience implementing AWS services in distributed computing and enterprise environments.
  • Deep understanding of the Hadoop ecosystem, including Sqoop, Flume, Kafka, Oozie, Hue, Zookeeper, HCatalog, Solr, and Avro.
  • Proficiency in ETL processes and working with various database types.
  • Bachelor’s degree in statistics, data science, or a related field, or equivalent work experience.

Preferred Skills

  • Experience with Apache Hadoop ecosystem tools.
  • Direct experience with Amazon Elastic Map Reduce (EMR) tuning.

Education

Any Graduate