We collaborated with Women in Kaggle Philly to provide an Introduction to Machine Learning (ML).
ML is the science of getting computers to learn from data without explicit programming. ML has led to advances in speech recognition, tumor identification, self-driving cars, and many other arenas. You probably interact with ML every day!
This meetup featured a lightning talk introducing some key concepts followed by a hands-on Kaggle competition tutorial.
A brief introduction to Machine Learning
Tamera discussed the distinction between supervised and unsupervised learning:
- Supervised learning involves labelled data. A model can be fitted to this data in order to generate predictions about new data.
- Unsupervised learning does not involve labelled data. Instead, the model infers structure within the data.
Tamera overviewed some model types including tree-based models and neural networks. A main take-away was that we shouldn’t be afraid to try out some ML techniques!
@RLadiesPhilly) December 13, 2018
Kaggle competition tutorial
- Kaggle is a data science and machine learning online community. Kaggle is famous for hosting machine learning competitions, but it also offers a public data platform, a cloud-based workbench, and educational resources for AI education.
- Women in Kaggle Philly is a meetup group for women interested in participating and exploring Kaggle competitions within an inclusive and supportive environment. They organize talks, workshops, and social events regularly.
Very excited about collaborating with #womeninkaggle for this meetup - @kaggle is a resource to learn and practice machine learning. Why? People like Ran share their knowledge and skills 👍🏻 pic.twitter.com/ecOUh14ik7— R-Ladies Philly (@RLadiesPhilly) December 14, 2018
Ran prepared this R notebook which we worked through during the tutorial.
We covered the following steps:
- Data Loading importing the data
- A Very Simple Exploratory Data Analysis (EDA) understanding the data
- Data Preprocessing AKA data cleaning/wrangling
- Simple Feature Engineering creating new predictor variables from the data
- Modeling & Evaluation fitting your model and seeing how well it does (we used Lasso regression and XGBoost)
- Creating Submission File submitting your prediction to the competition
We recommend taking a look at Ran’s R notebook. It includes R code for each of these steps interleaved with helpful explanations and links to supplementary material.