Chapter 9 Intermission: data wrangling

NHANES datasets are “curated” and are created following standard practice resulting in datasets listed in tabular data formatted in a way well suited for R.

This section is here as an “intermission” in the form of a lecture by Garrett Grolemund, Data Scientist and Master Instrutor at RStudio, split into 4 YouTube videos. The whole four parts are listed here, but the most important for treating NHANES data would be Part 3 about the dplyr Tidyverse package. Part 1 would review what was learned in the previous chapter (8) and Part 2 is about the tidyr package that helps reformat the data, a very useful tool but not really necessary for NHANES data.

Description of the RStudio videos:

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

Lectures on data wrangling: Tidyverse tidyr and dplyr packages.
Title Link Time
Part 1: What is data wrangling? Intro, Motivation, Outline, Setup https://youtu.be/jOd65mR1zfw 8:26
Part 2: Tidy Data and tidyr https://youtu.be/1ELALQlO-yM 17:36
Part 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U 19:34
Part 4: Working with Two Datasets: Binds, Set Operations, and Joins https://youtu.be/AuBgYDCg1Cg 7:23

9.1 Part 3 here

HTML version has Part 3 embedded here:

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U