Description
You will architect and build scalable data ingestion pipelines.
Responsibilities
- Design and build highly scalable data ingestion pipelines using AWS native services.
- Envision end-to-end ingestion processes while managing data privacy, lineage, and cost optimization.
- Coordinate delivery efforts with remote teams.
- Reverse engineer existing processes running on Hadoop implementations.
Required Skills
- 10+ years of hands-on experience in building data ingestion pipelines.
- Proficiency in Python and Java.
- Expertise with SQL and HQL.
- Experience handling structured and unstructured data ingestion (TXT, CSV, JSON, XML, Parquet).
- Experience streaming data from Kafka topics using Apache Flink.
- Familiarity with open table formats like Apache Iceberg or Hudi.
- Experience with AWS services: CloudFormation, AWS Glue Crawlers, Catalog, Step Functions, EventBridge, Lambda, S3, SNS, Kinesis.
- Proficiency with Git/Github versioning.
- Prior hands-on working experience in Cloudera Hadoop implementation.
Preferred Skills
- Prior experience with Jenkins for CI/CD.
- Exposure to Snowflake data environment.