You will build and maintain large-scale data processing and storage systems within a complex Big Data environment.
Responsibilities
Develop data ingestion, validation, transformation, and engineering code alongside the data engineering team.
Build open-source platform components using Databricks, Hadoop, Spark, Scala, Java, Oozie, and Hive.
Deliver solutions on cloud platforms and integrate with services like Azure Data Factory, ADLS, Azure DevOps, Azure Functions, Synapse, AWS Glue, Redshift, Lambda, or S3.
Document code artifacts, user documentation, and run books.
Troubleshoot deployments across environments, provide testing support, and participate in design sessions, demos, prototypes, and training workshops.
Required Skills
5+ years of experience working in complex Big Data and multi-vendor environments.
5+ years of experience with JIRA, GitHub, Git, and other code management tools.
3+ years of experience with large Hadoop projects using Spark and Python, including Spark DataFrame, Dataset APIs, SparkSQL, RDDs, and Scala function literals.
2+ years of experience developing large-scale data processing, storage, or distribution systems, preferably with Databricks.
Hands-on experience with Hadoop, Hive, Sqoop, Oozie, and HDFS.
Strong SQL skills and experience with Postgres or MySQL RDBMS platforms.
Experience with ELT/ETL development, patterns, and tooling.
Experience with AWS and/or Azure cloud environments.
Experience with Linux (RHEL or CentOS), IDEs, unit testing frameworks, and Maven.
Preferred Skills
Bachelor's degree in Computer Science or a related field.
Certifications in Spark, Databricks, AWS, Azure, or other cloud platforms.