Data cleaning is the process of preparing your data for analysis; ensuring that it is technically correct and in the desired format. Data cleaning can often be more time-consuming than the actual analysis! This was our second meetup on the topic. Click here for a recap of our first data cleaning meetup in June.
We began with an introduction to reshaping data from Alice. The presentation was based on the DataCamp tutorial Long to Wide Data in R. Data can be long format (one measurement per row) and wide format (many measurements in one row). It is important to be able to convert between the two formats as different functions require different formats. In R, there are a variety of functions which can be used for this task:
|Function||Package||To long format||To wide format|
|reshape||stats||reshape(direction = “long”, …)||reshape(direction = “wide”, …)|
Functions for converting to long format and wide format (adapted from Table 34 from Long to Wide Data in R)
Alice provided a few examples of how to use these functions. The script is available on github.
@RLadiesPhilly) August 9, 2018
The R-Ladies Data Cleaning Gauntlet!
Next up was a series of data cleaning challenges which we tackled in small groups. The challenges, created by Alice, meant putting into practice approaches and techniques from both data cleaning meetups. We cleaned the Philly farmers’ markets data which was also featured in our June meetup.
The materials are available on github:
We would like to thank WeWork for hosting us!
“WeWork is a community for creators. We transform buildings into beautiful, collaborative workspaces and provide the infrastructure, services, events and technology so our members can focus on doing what they love. WeWork currently has 111 locations in 29 cities across the world with over 70,000 members. Book a tour at wework.com now!”
This post was authored by Amy Goodwin Davies. For more information contact firstname.lastname@example.org