Description

You will build and orchestrate data pipelines and distributed computing applications within an AWS ecosystem.

Responsibilities

  • Build and orchestrate data pipelines and ETL processes.
  • Develop distributed computing applications using PySpark.
  • Design and implement data models using normalization, denormalization, and schema design.
  • Write, maintain, and execute automated unit tests following Test-Driven Development (TDD) practices.
  • Build APIs and manage serverless architectures.

Required Skills

  • 5+ years of experience in big data environments.
  • Proficiency in Python programming.
  • Strong expertise in SQL, Presto, Hive, and Spark.
  • Experience with PySpark and libraries including Pandas, Polars, and NumPy.
  • Extensive experience with AWS services: EMR, Lambda, Glue ETL, Step Functions, S3, ECS, Kinesis, IAM, RDS PostgreSQL, DynamoDB, CloudWatch Events/EventBridge, Athena, SNS, SQS, and VPC.
  • Experience with relational and NoSQL databases, including Amazon Redshift.
  • Knowledge of trading and investment data.
  • Experience with OneTick or KDB.
  • Understanding of CI/CD, source control, and data warehousing concepts.

Preferred Skills

  • Proficiency in data visualization tools, specifically Tableau.

Education

Any Graduate