Build and maintain data transformation processes, metadata structures, and workload management systems.
Responsibilities
- Build and implement data ingestion and curation processes using Spark, Hive, and HDFS.
- Develop high-performance ETL code to handle large-scale data ingestion from multiple platforms.
- Create databases, schemas, and Hive tables using formats such as Orc, Parquet, Avro, and Text.
- Monitor production job performance and recommend infrastructure adjustments.
- Manage code versioning via Bitbucket and maintain CI/CD pipelines.
Required Skills
- 5+ years of experience in big data development.
- Proficiency in Spark using Scala or Python.
- Hands-on experience with Hive, HDFS, Sqoop, HBase, Kerberos, Sentry, and Impala.
- Strong SQL skills for complex queries, data analysis, and anomaly detection.
- Experience with Hadoop commands and shell scripting.
- Proficiency with Git and Bitbucket.
- Knowledge of CI/CD pipelines.
Preferred Skills