Build andmaintainscalable, distributed, fault-tolerant data pipelines on GCP, including BigQuery-based lakehouse layers and Dataproc -driven Delta Lake workflows
Activelyparticipatein meetings with various stakeholders across data engineering, compliance, and business teams globally
Understand market data processing and transformation needs; build pipelines toacquire, normalise , transform, and release large volumes of financial data through the OMDP data factory
Design and implement bitemporal data models (valid-time + system-time) on BigQuery to support certified, regulatory-grade time-series datasets
Build, use, andmaintainsoftware testing frameworks (unit / non-regression / user acceptance) for data pipelines and transformation logic
Take complete ownership of solutions and assigned tasks, including ingestion pipelines, QA workflows, correction management, and audit trail implementation.
Work in a collaborative manner with other team members and contribute to shared platform services rather than vertical-specific implementations
Have business acumen to understand financial concepts around reference data related to equities and other asset classes
Support teams across data and technology in implementing AI solutions and integrating their services with MSCI's data science products and platforms, including AI-assisted ingestion, anomaly detection, and semantic search over the lakehouse using Vertex AI
Requirements:
6-8 years of experience in data engineering
Proficient in Python programming — data pipeline development, transformation logic, and automation scripts
Proficient in data query and analysis using SQL, with strong hands-on experience in BigQuery — partitioning, clustering, materialised views, and time-series query patterns at scale
Hands-on experience building and scheduling pipelines using Cloud Composer (Apache Airflow) — DAG authoring, SLA alerting, retry logic, and dependency management
Working knowledge of Dataproc (Apache Spark) — batch ingestion, Delta Lake merge operations, and incremental data processing
Proficient in AI-assisted development tools such as GitHub Copilot, Cursor, or others for accelerating code generation and enhancing developer productivity
Code versioning and collaboration using Git — branching strategies, pull request workflows, and pipeline-as-code practices
Familiarity with REST APIs — consuming external data vendor APIs and building service-layer integrations
Familiarity with GCP cloud technologies — Cloud Storage, Pub/Sub, Datastream , Cloud Monitoring, IAM, and VPC Service Controls