December 2019 Recap: Machine Learning Workshop

On December 2, 2019, almost 50 data enthusiasts joined R-Ladies Philly for a workshop on machine learning in R.

The workshop was led by Trang Le. Trang is a researcher at the University of Pennsylvania who has authored several R packages and applies machine learning to biomedical data. Follow her on twitter and at https://trang.page.

At a high-level, this workshop covered:

  • Intro to the caret package and why it exists
  • Live demo and exercises using a dataset of beer reviews
  • Trang’s insights into good practices

The materials for this workshop are available online:

Do you even caret all?

The caret package was created to solve the problem of lots of modeling packages that didn’t play well together. caret currently unifies over 200 models!

Trang suggested that you get started with the caret website. She also reminded us that the package is not perfect. When you find issues or errors, contribute to the codebase or submit issues!

Machine learning for beer lovers

This workshop used the beer ratings dataset available on Kaggle (link). This data is freely available and provides a tasty example to practice on. Each review includes ratings on the appearance, aroma, palate, taste, and overall impression of a beer. Reviews include product and user information, followed by each of these five ratings, and a plaintext review.

Remember - it is important to clean your data! Trang recommended the skimr package and skim function to quickly get a look at your dataset.

For this workshop, we used 1,000 reviews to predict the ABV (alcohol content) of beers from the reviews.

Making some predictions…

Using 1,000 reviews from the beer review dataset, attendees practiced…

  • Dimensionality reduction with principal component analysis
  • Fitting a support vector machine model, then tuning the parameters
  • Testing a random forest model
  • Using the unstructured text reviews to predict and then evaluating which words were the most predictive

Closing thoughts

Trang wrapped up with a Q&A session. During this time, she discussed some comparisons between machine learning frameworks in R versus python and what “counts” as machine learning.

Thank you!

Thank you to all our attendees, our sponsors (Elsevier), and especially Trang!!

About our sponsor: Elsevier fuses evidence-based trusted content, cutting-edge technology and analytics in a range of innovative digital applications for end users in the scientific, academic and medical worlds. Our leading-edge applications, platforms and products are used globally to advance science, aid discovery, improve patient outcomes and to positively impact people’s lives.

This post was authored by Alice Walsh. For more information contact