Results-driven Data Scientist with over 6+ years of experience in delivering data-driven solutions, including 3+ years specializing in machine learning algorithms, predictive analytics, statistical modeling, natural language processing (NLP), and AI techniques, along with 3+ years as a Data Analyst. Proven expertise in designing, developing, and deploying scalable data products that transform complex datasets into actionable insights to drive business growth and innovation.
• Adept at extracting actionable insights from complex datasets and building data-driven solutions that drive strategic decision-making.
• Passionate about leveraging advanced analytical techniques and cutting-edge tools to solve real-world problems and optimize business outcomes across various domains.
• Involved in entire Data Science Project life cycle from data extraction to Machine learning model evaluation and Storytelling.
• Built and deployed end-to-end scalable data pipelines on Databricks using PySpark and SQL, processing terabytes of structured/unstructured data for analytics and machine learning readiness.
• Designed, developed, and productionized supervised and unsupervised ML models (regression, classification, NLP, clustering, forecasting) to generate business-ready insights, ensuring reproducibility and scalable deployment across enterprise pipelines.
• Innovative Data Scientist with hands-on expertise in image processing, computer vision, and machine learning, specializing in object detection, pattern recognition, and visual data analysis.
• Designed, trained, and optimized ML models (classification, regression, NLP, forecasting) with Python libraries (Scikit-learn, TensorFlow, PyTorch), ensuring reproducibility and scalable deployment.
• Led end-to-end fine-tuning of transformer-based NLP and LLM models (GPT-style, BERT, CNN/RNN hybrids), applying domain-specific data augmentation, hyperparameter tuning, and evaluation to optimize model accuracy and contextual relevance.
• Skilled in implementing filtering, edge detection, binary morphology, and affine/perspective transformations using OpenCV, scikit-image, and NumPy for data preprocessing and feature extraction.
• Maintained and enhanced Python-based XGBoost models, performing retraining with updated data, tuning hyperparameters, and monitoring performance to ensure >90% prediction accuracy.
• Designed and deployed predictive and prescriptive models (classification, forecasting, optimization) to support personalization, SEO, navigation, and customer engagement, driving measurable revenue and satisfaction improvements.
• Wrangled unstructured/structured data to help show senior management that more optimal and faster decisions can be made with the right data.
• Knowledgeable in High Dynamic Range (HDR) imaging and tone mapping for enhancing contrast and visibility across lighting conditions in image datasets.
• Architected retrieval-augmented generation (RAG) pipelines, integrating relation extraction and semantic search with knowledge graph structures to enhance contextual grounding for downstream LLM applications.
• Strong experience in data science, machine learning and artificial intelligence using different methodologies like Regression, Bayesian, Decision Trees, Random Forests, SVM, Kernel SVM, Naïve Bayes, K-means Clustering, Natural Language processing (NLP) among others.
• Deep expertise in Large Language Models (LLMs), transformer architectures, and foundation models, including fine-tuning (LoRA, transfer learning) and building enterprise-grade Retrieval-Augmented Generation (RAG) pipelines with vector databases and semantic search.
• Strong analytical and problem-solving mindset with a passion for transforming visual data into actionable intelligence through AI-driven automation.
• Partnered with data engineers and software teams to integrate ML models into end-to-end pipelines (AWS, Snowflake, PySpark, SageMaker, GCP ML Engine), enabling real-time insights and sustainable business intelligence solutions.
• Conducted exploratory data analysis (EDA) with Pandas, NumPy, Matplotlib, and Seaborn to identify milestone progressions, blockers, and key triggers that drive readiness-to-complete (RTC) processes.
• Built automated insights frameworks using Python, Snowflake, and AWS (S3, Glue, SageMaker) to provide business partners with real-time dashboards and self-service analytics, reducing manual reporting time by 60%.
• Scientific thinking and ability to invent, a track record of thought leadership and contributions.
• Strong knowledge in applying big data/advanced analytics to identify and exploit data with positive business impact.
• Self-starter and able to work interactively and independently with stakeholders. Expertise in managing multiple projects and teams with an excellent track record.
• Implemented distributed model training using PyTorch and TensorFlow on multi-GPU clusters (AWS EMR, Spark, Hadoop), leveraging data parallelism and optimized pipelines to support large-scale NLP/LLM workloads.
• Extracted, cleaned, and transformed structured and unstructured data (including blocker notes and milestone fields) from enterprise project-tracking systems to prepare for modeling and visualization.
• Conducted deep statistical analysis (A/B testing, hypothesis testing, time-series forecasting, Bayesian inference) to identify key drivers of marketing promotions and supply chain performance and translated findings into actionable business recommendations.
• Experience with SQL queries to perform data analysis, data mapping and data validation of the transformed data output.
• I leveraged packages like Sklearn, Keras 2.0, TensorFlow, NLTK, SciPy, Deeplearning4j in Python in developing and evaluating Machine learning, Deep Learning and NLP Models for Problem solving.
• Hands on Experience with Data Manipulation packages like Numpy, Pandas, SQLAlchemy and Data Visualizing packages like Matplotlib, Seaborn and Bokeh.
• Experienced in SQL programming and creation of relational and Non- Relational Data Bases
• Worked on Statistical models to create new theories and products. I employed Spotfire, Tableau to create dashboards and visualizations.
• Developed and maintained executive-level dashboard reports in Tableau and Power BI to track financial performance, customer satisfaction, and digital engagement metrics, enabling leadership to make data-driven decisions.
• Extensive experience designing and deploying end-to-end Machine Learning and Artificial Intelligence solutions using Python with frameworks like PyTorch and TensorFlow, delivering scalable, production-grade models across enterprise environments.
• Trained and Deployed Cloud Machine Learning Models on Google Cloud ML engine using TensorFlow.
• Designed and implemented supervised algorithms like Logistics Regression, Decision trees, XGboost, SVM’s, Polynomial Regression and Unsupervised Machine Learning algorithms like clustering. K-means. Mixture models. Hierarchical Clustering, Anomaly Detection.
• Designed prompt engineering and evaluation workflows for enterprise LLM applications to improve response quality, consistency, and explainability.
• Created and optimized executive-ready dashboards in Tableau and Power BI that transformed model outputs into clear narratives, empowering stakeholders with actionable insights for decision-making.
• Applied advanced machine learning and NLP techniques (CNN, RNN, XGBoost, transformer-based models) to optimize search ranking, recommendation engines, and fraud detection, achieving >90% accuracy in key models.
• Worked with clients to identify analytical needs and documented them for further use. Identified problems and provided solutions to business problems using data processing, data visualization and graphical data analysis.
• Solid knowledge of mathematics and experience in applying it to technical and research fields. Identifying areas where optimization can be efficient.
• Applied Explainable AI and causal inference techniques (feature importance, A/B testing, hypothesis testing) to ensure transparency, interpretability, and trust in ML-driven recommendations.
• Compiled Statistical methodologies, Statistics methodologies like A/B testing, Hypothesis testing, Statistical inference, Parameter estimation on historical data to make decisions while solving the Business problems.
• Proficient in using SnowSQL for complex data manipulation tasks and developing efficient data pipelines.
• Experienced in partitioning strategies and multi-cluster warehouses in Snowflake to ensure optimal query performance and scalability.
• Led data wrangling and transformation of structured and unstructured datasets, leveraging Python, R, and SQL to prepare high-quality features for modeling, achieving >90% prediction accuracy in critical use cases.
• Skilled in designing roles, views, and implementing performance tuning techniques to enhance Snowflake system performance.
• Performed data cleansing, transformation, and feature engineering on large-scale datasets to enhance model accuracy, leveraging Spark SQL and Databricks Delta Lake for efficient processing.
• Proficient in utilizing virtual warehouses, caching, and Snowpipe for real-time data ingestion and processing in Snowflake.
• Strong knowledge of Snowflake's timetravel feature for auditing and analyzing historical data.
• Extensive experience in leveraging window functions, Snowflake arrays, regular expressions, and JSON parsing for advanced data analysis and manipulation.
• Proven ability to design and deploy cloud-native ML pipelines using AWS (SageMaker, S3, Glue) and Azure ML platforms, integrating scalable data processing frameworks like Databricks and Snowflake for large-scale structured and unstructured data.
• Built intelligent AI orchestration workflows combining retrieval systems, validation logic, and multi-step reasoning to support enterprise decision-making applications.
• Partnered with cross-functional teams to deliver data-driven business solutions, integrating ML outputs into dashboards and executive decision workflows.
• Highly proficient in Snowflake scripting to automate ETL processes, data transformations, and data pipelines.
• Expertise in AWS S3 for scalable and cost-effective data storage and retrieval.
• High end knowledge on Big Data technologies like Spark SQL, PySpark, Hive, Scoop, Flume, Ambari Console.
• Developed predictive models using Python to predict customers churn and classification of customers.
• Leveraged Image Processing techniques for object recognition using Deep Learning techniques.
• Query optimization, execution plan and Performance tuning of queries for better performance in SQL.
• Worked on Shiny and R application showcasing machine learning for improving the forecast of business.
• Hands on experience with version control systems like GIT, Github