December 2019 Recap: Machine Learning Workshop

Dec 8, 2019 3 min read

On December 2, 2019, almost 50 data enthusiasts joined R-Ladies Philly for a workshop on machine learning in R.

The workshop was led by Trang Le. Trang is a researcher at the University of Pennsylvania who has authored several R packages and applies machine learning to biomedical data. Follow her on twitter and at https://trang.page.

At a high-level, this workshop covered:

Intro to the caret package and why it exists
Live demo and exercises using a dataset of beer reviews
Trang’s insights into good practices

Off to a great start with our machine learning workshop! pic.twitter.com/lNUCAoiQye
— R-Ladies Philly (@RLadiesPhilly) December 2, 2019

The materials for this workshop are available online:

Slides: https://slides.com/trang1618/caret-rladies
RStudio Cloud: bit.ly/33MFHLy
Code: https://github.com/trang1618/rladies-caret

Do you even caret all?

The caret package was created to solve the problem of lots of modeling packages that didn’t play well together. caret currently unifies over 200 models!

Trang suggested that you get started with the caret website. She also reminded us that the package is not perfect. When you find issues or errors, contribute to the codebase or submit issues!

Machine learning for beer lovers

This workshop used the beer ratings dataset available on Kaggle (link). This data is freely available and provides a tasty example to practice on. Each review includes ratings on the appearance, aroma, palate, taste, and overall impression of a beer. Reviews include product and user information, followed by each of these five ratings, and a plaintext review.

Remember - it is important to clean your data! Trang recommended the skimr package and skim function to quickly get a look at your dataset.

For this workshop, we used 1,000 reviews to predict the ABV (alcohol content) of beers from the reviews.

Machine learning 101: before building your models… MAKE SURE YOUR DATASET IS CLEAN. (Or, can't do the fun stuff until you've completed your data cleaning chores) #rstats @trang1618
— R-Ladies Philly (@RLadiesPhilly) December 3, 2019

Making some predictions…

Using 1,000 reviews from the beer review dataset, attendees practiced…

Dimensionality reduction with principal component analysis
Fitting a support vector machine model, then tuning the parameters
Testing a random forest model
Using the unstructured text reviews to predict and then evaluating which words were the most predictive

Closing thoughts

Trang wrapped up with a Q&A session. During this time, she discussed some comparisons between machine learning frameworks in R versus python and what “counts” as machine learning.

Thank you!

Thank you to all our attendees, our sponsors (Elsevier), and especially Trang!!

Do you even caret all? With Trang Le, PhD! pic.twitter.com/ZeN8DkTFQT
— R-Ladies Philly (@RLadiesPhilly) December 2, 2019

About our sponsor: Elsevier fuses evidence-based trusted content, cutting-edge technology and analytics in a range of innovative digital applications for end users in the scientific, academic and medical worlds. Our leading-edge applications, platforms and products are used globally to advance science, aid discovery, improve patient outcomes and to positively impact people’s lives.

This post was authored by Alice Walsh. For more information contact philly@rladies.org