0 Projects complete
I am a data junkie who is skilled in solving analytical problems using quantitative approaches and developing machine learning models. I am a self-starter, who likes to explore and learn newer methodologies of solving problems bringing ideas to fruition. My main interests lie in applied machine learning and managing big data to solve real world problems.
Courses: Big Data Analytics, Statistics for Data Science, Machine Learning, Internet of Things and Edge Computing, Data Science for Public policy, Database, Computational Linguistics, Computer Architecture
Team - Human Health
Developing Patients Insights Engine for Non-Small Cell Lung Cancer, worked before on quantifying the impact of HPV Vaccine Hesitancy in France.
Conducted in-depth evaluation of marketing campaign data to help derive insights to improve audience reach, increase ROI of future promo campaigns and suggested recommendations for future initiatives
Developed and refined customer segmentation models to analyse effects of social media campaigns supporting retention and long term promotional campaign profitability.
TA for CIS 545 Big Data Analytics course. Helped in designing home works and hosting big data on S3 instances on AWS. Developed a guide to spin up EMR cluster using AWS CLI. Mentored 8 student groups for their final projects.
TA for GAFL 531 Statistics for Public Policy class. Helped students during Office hours and assisted in checking home works.
Overhauled Customer Maturity Model for customer survey data of 300 publicly traded companies. Performed K-means clustering to segment companies into separate cohorts w.r.t their public opinion. Performed sentiment analysis on customer feedback for companies.
Examined financial trends by analyzing different financial indicators (like Cap-Ex, Long-Term Debt, etc) to validate clustering model and provided detailed analyses for different industries (like Retail, Healthcare, etc).
Assisted the Procurement Department at university to identify high-risk transactions as part of Purchasing Services High Risk Procurement Project.
Designed custom anomaly detection algorithm from Local Outlier Factor and Isolation Forests algorithm for dataset of 1.8 million Oracle Financials invoices.
Presented model analysis to both technical and non-technical (Procurement Department) teams through high-level executive summaries.
PostgreSQL, MySQL, MATLAB, Neo4j, Spark, Pandas, R Shiny, AWS, Git
Databases Course Capstone Project - Video Link
Created a soccer app using data scraped from sofifa.com hosted on AWS Lamba serverless instance. Designed a relational model for the dataset in 3NF format deployed on AWS RDS instance with appropriate indexes and an optimal query plan to decrease the query execution time by 10x.
Developed the backend for an API using Express.js using couple of middleware to abstract away the need to manually connect to the client database.
Developed multi-class classification for 50 news article authors using LSTM, Bi-LSTM, GRU neural networks at sentence and article levels for corpora of articles.
Improved the classification accuracy by 20% over the baseline LSTM model using SVM and 40 different Stylometry features.
Github Repo | Project ReportScraped BBC-GoodFoods website for 13000+food recipes using Chromium browser and BeautifulSoup package.
Developed a R Shiny app with Neo4j DB to query the graph database to assist a cook in exploring new recipes and optimizing user time and effort spent while cooking.
Investigated SEPTA regional rail’s claim of 91% of On-Time-Performance. Utilized TWINT API to get tweets from @SEPTA Twitter handle and validated the claim by performing regression between actual delays and the ones claimed in tweets.
Scrapped the weather data of Philadelphia to improve the prediction model. Streamlined Spark pipeline to train a random forest regressor model to predict the delays in arrival time.
Github Repo | Py Notebook PresentationDeveloped an interactive dashboard (R Shiny) by analyzing 5.8 Million flight operation details to demonstrate the key functionality of comparing different airlines departure/ arrival timings for given airport based on the day-of-week, time-of-day, taxi times and other features.
Performed statistical analysis for US flight delay data in R to build predictive models for flight delay patterns.
Github RepoThis project uses statistical and spatial (mapping) analyses to better understand the impact of changeable neighborhood characteristics on mental health, and proposes a way to use population level risk factors to assess service need and adequacy of community resources.
Github Repo | Try the app