September Recap: Reporting Research with R Markdown


R-Ladies Philadelphia’s September 2019 meetup was on the topic of reporting research using R Markdown. Our speaker was Ramaa Nathan, PhD, a statistican and data scientist with a background in the healthcare and finance industries. Ramaa is a co-organizer of the Data Science Philadelphia (DataPhilly) meetup group.

Ramaa’s materials are available here.

Ramaa’s talk included the following:

  • An introduction to reproducible research
  • A reproducible research case study: Ramaa’s independent research which investigates breast cancer survival
  • An overview of output formats available with R Markdown

What is reproducible research?

Ramaa explained that reproducible research involves making data and analyses available so that others can reproduce the research findings. Literate programming frameworks which interleave documentation with code, such as R Markdown, are a great tool for sharing your analyses in an accessible way! Ramaa highlighted the importance of making the data available (with an accompanying data dictionary) as well as information about the software environment used. Ramaa also discussed the utility of public repositories (such as Github) for sharing these materials and the use of containers (such as Docker) to package code and dependencies together, ensuring the code can be run in any computing environment.

Reproducible research is an essential component of Open Science, the topic of R-Ladies Philly’s April 2019 meetup and Technical.ly guest post.

Breast cancer survival analysis

Ramaa introduced Survival Analysis, the branch of statistics used to analyze the expected duration of time until one or more events occur. She introduced the following key terms:

  • Time to Event: The expected duration of time until one or more events occur. Examples include survival time from onset of diagnosis, time until progression from one stage of disease to another, and time from surgery until hospital discharge.
  • Survival Function: The probability of surviving beyond time t.
  • Hazard Rate: The probability that if a person survives to time t, they will experience the event in the next instant.
  • Cumulative hazard: The accumulation of hazard rate over time since time of first diagnosis.
  • Hazard Ratio: The ratio of two hazard functions and is used to compare the risk of death between two treatment groups.

For her research investigating breast cancer survival, Ramaa used several survival analysis techniques to analyze publicly-available data from The Cancer Genome Atlas (TCGA) Program. Ramaa’s report on her research is available here and all of her code is available here.

Output formats using R Markdown

Ramaa overviewed the three basic components of R Markdown which work together to produce the output.

  • Markdown, for formatted text
  • KnitR, for embedded R code
  • The YAML (“Yet Ain’t Markup Language”) header, for rendering

The output format can be specified the the YAML header. Ramaa discussed the following (non-exhaustive) list of available output formats:

  • Documents: HTML, PDF, Markdown, Word Document, Notebook, OpenDocument Text Document, Rich Text Format Notebook
  • Presentations: IOSlides, Slidy, Beamer, Powerpoint

The audience was super-impressed that Ramaa’s powerpoint presentation was generated using R Markdown! Powerpoint presentations are a fairly recent addition (available for RStudio Version 1.2.x with Pandoc version 2.x).

For a general overview of R Markdown, Ramaa recommended RStudio’s R Markdown materials, especially the cheat sheet.

Thank yous

  • We would like to thank Ramaa Nathan for delivering such an engaging and informative talk! If you are interested in volunteering as a speaker or workshop leader for a future R-Ladies Philly event please get in touch with the organising team at philly@rladies.org.

  • Thanks to Comcast Enterprise Business Intelligence for providing food and drinks! Comcast EBI has multiple open roles on the Strategic Analytics team ranging from Analyst to Sr. Manager. If interested email jessie_pluto@comcast.com!

  • Thank you to WeWork for hosting us. “WeWork is a community for creators. We transform buildings into beautiful, collaborative workspaces and provide the infrastructure, services, events and technology so our members can focus on doing what they love. WeWork currently has 425 locations in 100 cities and 27 countries across the world with over 400,000 members. Book a tour at wework.com now!”

This post was authored by Amy Goodwin Davies. For more information contact philly@rladies.org