December Recap

We collaborated with Women in Kaggle Philly to provide an Introduction to Machine Learning (ML).

ML is the science of getting computers to learn from data without explicit programming. ML has led to advances in speech recognition, tumor identification, self-driving cars, and many other arenas. You probably interact with ML every day!

This meetup featured a lightning talk introducing some key concepts followed by a hands-on Kaggle competition tutorial.

A brief introduction to Machine Learning

Tamera Lanham, a data scientist at Elsevier, gave a lightning talk introducing ML. Her slides are available here.

Tamera discussed the distinction between supervised and unsupervised learning:

  • Supervised learning involves labelled data. A model can be fitted to this data in order to generate predictions about new data.
  • Unsupervised learning does not involve labelled data. Instead, the model infers structure within the data.

Tamera overviewed some model types including tree-based models and neural networks. A main take-away was that we shouldn’t be afraid to try out some ML techniques!

Kaggle competition tutorial

Ran Liu, Women in Kaggle Philly organiser and PhD Candidate in Sociology at Penn, led the Kaggle competition tutorial. Before delving into code, Ran introduced Kaggle and Women in Kaggle Philly:

  • Kaggle is a data science and machine learning online community. Kaggle is famous for hosting machine learning competitions, but it also offers a public data platform, a cloud-based workbench, and educational resources for AI education.
  • Women in Kaggle Philly is a meetup group for women interested in participating and exploring Kaggle competitions within an inclusive and supportive environment. They organize talks, workshops, and social events regularly.

Ran prepared this R notebook which we worked through during the tutorial.

We covered the following steps:

  1. Data Loading importing the data
  2. A Very Simple Exploratory Data Analysis (EDA) understanding the data
  3. Data Preprocessing AKA data cleaning/wrangling
  4. Simple Feature Engineering creating new predictor variables from the data
  5. Modeling & Evaluation fitting your model and seeing how well it does (we used Lasso regression and XGBoost)
  6. Creating Submission File submitting your prediction to the competition

We recommend taking a look at Ran’s R notebook. It includes R code for each of these steps interleaved with helpful explanations and links to supplementary material.

Thank you

  • Many thanks to our expert presenters Tamera Lanham and Ran Liu!
  • Orchestrall sponsored this meetup. Orchestrall creates healthcare solutions for a global world. They greatly value diversity and are very supportive of R-Ladies’ mission!
  • We were hosted by WeWork.

Resources

This post was authored by Amy Goodwin Davies. For more information contact philly@rladies.org