Description

Build and maintain data transformation processes, metadata structures, and workload management systems.

Responsibilities

  • Build and implement data ingestion and curation processes using Spark, Hive, and HDFS.
  • Develop high-performance ETL code to handle large-scale data ingestion from multiple platforms.
  • Create databases, schemas, and Hive tables using formats such as Orc, Parquet, Avro, and Text.
  • Monitor production job performance and recommend infrastructure adjustments.
  • Manage code versioning via Bitbucket and maintain CI/CD pipelines.

Required Skills

  • 5+ years of experience in big data development.
  • Proficiency in Spark using Scala or Python.
  • Hands-on experience with Hive, HDFS, Sqoop, HBase, Kerberos, Sentry, and Impala.
  • Strong SQL skills for complex queries, data analysis, and anomaly detection.
  • Experience with Hadoop commands and shell scripting.
  • Proficiency with Git and Bitbucket.
  • Knowledge of CI/CD pipelines.

Preferred Skills

  • Bachelor's Degree.

Education

Bachelor's Degree