June recap: Data Cleaning with R

Our June meetup

Our June meetup was about cleaning data using R. Data cleaning is the process of preparing your data for analysis; ensuring that it is technically correct and in the desired format. Data cleaning can often be more time-consuming than the actual analysis!

We were live tweeting the event! Check out some tweets below.

Introduction

Darina and Katerina provided an introduction to data cleaning, with a focus on the process of ensuring the data is technically correct. Their presentation on data cleaning is posted on github. They referenced Chapters 1-2 of “An Introduction to Data Cleaning with R” by Edwin de Jonge and Mark van der Loo, which we recommend you take a look at to learn more.

We covered topics such as:

  • Importing data into R
  • Best practice for inspecting data
  • Dealing with NAs versus NaNs
  • Variable types
  • Dealing with inconsistent data
  • Reshaping and pivoting data

The presentation was supplemented by a live demo on some raw data about Philly’s farmers’ markets. Check out the data and the code on github.

Irem Celen’s lightning talk: “Cleaning up Messy Genetics Data”

After the introduction, Irem Celen delivered a lightning talk about an important application of data cleaning: Messy genetics data! Irem is a Ph.D. candidate in Bioinformatics and Systems Biology at the University of Delaware. In her talk she outlined some common data cleaning issues for genetics data as well as some great solutions in R.

Irem’s lightning talk

WeWork

We would like to thank WeWork for hosting our June meetup!

“WeWork is a community for creators. We transform buildings into beautiful, collaborative workspaces and provide the infrastructure, services, events and technology so our members can focus on doing what they love. WeWork currently has 111 locations in 29 cities across the world with over 70,000 members. Book a tour at wework.com now!”

This post was authored by Amy Goodwin Davies. For more information contact philly@rladies.org