Description
You will build and maintain ETL pipelines to transform raw data from various sources into usable formats.
Responsibilities
- Transform raw data through reshaping, aggregating, and normalizing datasets for analysis.
- Perform data cleaning and preprocessing using Python, SQL, and Excel.
- Identify and resolve data quality issues including duplicates, outliers, and missing values.
- Execute data profiling to identify patterns and anomalies within large datasets.
- Ensure data integrity and consistency across multiple sources like APIs, logs, and databases.
Required Skills
- 5+ years of experience in data engineering and ETL development.
- Proficiency with ETL tools such as IICS, Talend, or Hadoop.
- Strong SQL, PL/SQL, and Oracle database skills.
- Hands-on experience with PySpark and Shell scripting.
- Experience working within Azure environments and basic understanding of GCP and AWS.
- Ability to clean and preprocess data using Python and Excel (pivot tables, filters, and functions).
- Experience with Power BI for data visualization.
- Proficiency using JIRA, Confluence, and MS Visio.
- Expertise in handling structured data from spreadsheets, databases, and APIs.
Preferred Skills
- Knowledge of R programming.