Hey! I am

Arth Talati

I'm a

About Me

I am a data junkie who is skilled in solving analytical problems using quantitative approaches and developing machine learning models. I am a self-starter, who likes to explore and learn newer methodologies of solving problems bringing ideas to fruition. My main interests lie in applied machine learning and managing big data to solve real world problems.



0 Projects complete

Download CV

Education

2019-2021

Master of Science in Data Science

University of Pennsylvania

Courses: Big Data Analytics, Statistics for Data Science, Machine Learning, Internet of Things and Edge Computing, Data Science for Public policy, Database, Computational Linguistics, Computer Architecture

2015-2019

Bachelor of Technology in Electronics and Communication Engineering

National Institute of Technology, Surat

Experience

July 2021 - Present

Data Scientist, Sr. Specialist

Merck

Team - Human Health

Developing Patients Insights Engine for Non-Small Cell Lung Cancer, worked before on quantifying the impact of HPV Vaccine Hesitancy in France.

Jan - May 2021

Technical Consultant

FOX Entertainment & Wharton Consumer Analytics

Conducted in-depth evaluation of marketing campaign data to help derive insights to improve audience reach, increase ROI of future promo campaigns and suggested recommendations for future initiatives

Developed and refined customer segmentation models to analyse effects of social media campaigns supporting retention and long term promotional campaign profitability.

Aug - Dec 2020

Teaching Assistant

University of Pennsylvania

TA for CIS 545 Big Data Analytics course. Helped in designing home works and hosting big data on S3 instances on AWS. Developed a guide to spin up EMR cluster using AWS CLI. Mentored 8 student groups for their final projects.

TA for GAFL 531 Statistics for Public Policy class. Helped students during Office hours and assisted in checking home works.

May - Aug 2020

Data Science Intern

C-Space and Wharton Consumer Analytics

Overhauled Customer Maturity Model for customer survey data of 300 publicly traded companies. Performed K-means clustering to segment companies into separate cohorts w.r.t their public opinion. Performed sentiment analysis on customer feedback for companies.

Examined financial trends by analyzing different financial indicators (like Cap-Ex, Long-Term Debt, etc) to validate clustering model and provided detailed analyses for different industries (like Retail, Healthcare, etc).

Jan - May 2020

Research Analyst

Penn Data Science Group

Assisted the Procurement Department at university to identify high-risk transactions as part of Purchasing Services High Risk Procurement Project.

Designed custom anomaly detection algorithm from Local Outlier Factor and Isolation Forests algorithm for dataset of 1.8 million Oracle Financials invoices.

Presented model analysis to both technical and non-technical (Procurement Department) teams through high-level executive summaries.

Skills

Tools & Languages

SQL, Python, Java, PySpark

PostgreSQL, MySQL, MATLAB, Neo4j, Spark, Pandas, R Shiny, AWS, Git

Awards

2020

Best Project Design - CIS 550 DataBase Course Fall 2020

University of Pennsylvania

Databases Course Capstone Project - Video Link

Projects

Sep - Dec '20

Football Freak

Database and Information Systems, UPenn

Created a soccer app using data scraped from sofifa.com hosted on AWS Lamba serverless instance. Designed a relational model for the dataset in 3NF format deployed on AWS RDS instance with appropriate indexes and an optimal query plan to decrease the query execution time by 10x.

Developed the backend for an API using Express.js using couple of middleware to abstract away the need to manually connect to the client database.


Github Repo | Project Report
Video

Mar - May '20

Deep Learning for Authorship Identification

Applied Machine Learning, UPenn

Developed multi-class classification for 50 news article authors using LSTM, Bi-LSTM, GRU neural networks at sentence and article levels for corpora of articles.

Improved the classification accuracy by 20% over the baseline LSTM model using SVM and 40 different Stylometry features.

Github Repo | Project Report
Video

May - Jun '20

Cook Assist

Applied Machine Learning, UPenn

Scraped BBC-GoodFoods website for 13000+food recipes using Chromium browser and BeautifulSoup package.

Developed a R Shiny app with Neo4j DB to query the graph database to assist a cook in exploring new recipes and optimizing user time and effort spent while cooking.


Github Repo |
Apr - May '20

SEPTA On-Time Performance Analysis

Big Data Analytics, UPenn

Investigated SEPTA regional rail’s claim of 91% of On-Time-Performance. Utilized TWINT API to get tweets from @SEPTA Twitter handle and validated the claim by performing regression between actual delays and the ones claimed in tweets.

Scrapped the weather data of Philadelphia to improve the prediction model. Streamlined Spark pipeline to train a random forest regressor model to predict the delays in arrival time.

Github Repo | Py Notebook Presentation
Apr - May '20

US Flights Delay Analysis

Data Science for Public Policy, UPenn

Developed an interactive dashboard (R Shiny) by analyzing 5.8 Million flight operation details to demonstrate the key functionality of comparing different airlines departure/ arrival timings for given airport based on the day-of-week, time-of-day, taxi times and other features.

Performed statistical analysis for US flight delay data in R to build predictive models for flight delay patterns.

Github Repo
Jan - Feb '20

Philly Place Matters App

Data Science for Public Policy, UPenn

This project uses statistical and spatial (mapping) analyses to better understand the impact of changeable neighborhood characteristics on mental health, and proposes a way to use population level risk factors to assess service need and adequacy of community resources.

Github Repo | Try the app

Contact

Contact Me

Address

Philadelphia, PA