Chapter 9 Intermission: data wrangling

NHANES datasets are “curated” and are created following standard practice resulting in datasets listed in tabular data formatted in a way well suited for R.

This section is here as an “intermission” in the form of a lecture by Garrett Grolemund, Data Scientist and Master Instrutor at RStudio, split into 4 YouTube videos. The whole four parts are listed here, but the most important for treating NHANES data would be Part 3 about the dplyr Tidyverse package. Part 1 would review what was learned in the previous chapter (8) and Part 2 is about the tidyr package that helps reformat the data, a very useful tool but not really necessary for NHANES data.

Description of the RStudio videos:

Data wrangling is too often the most time-consuming part of data science and applied statistics. Two tidyverse packages, tidyr and dplyr, help make data manipulation tasks easier. These videos introduce you to these tools. Keep your R code clean and clear and reduce the cognitive load required for common but often complex data science tasks.

Lectures on data wrangling: Tidyverse `tidyr` and `dplyr` packages.
Title	Link	Time
Part 1: What is data wrangling? Intro, Motivation, Outline, Setup	https://youtu.be/jOd65mR1zfw	8:26
Part 2: Tidy Data and `tidyr`	https://youtu.be/1ELALQlO-yM	17:36
Part 3: Data manipulation tools: `dplyr`	https://youtu.be/Zc_ufg4uW4U	19:34
Part 4: Working with Two Datasets: Binds, Set Operations, and Joins	https://youtu.be/AuBgYDCg1Cg	7:23

9.1 Part 3 here

HTML version has Part 3 embedded here:

Pt. 3: Data manipulation tools: dplyr https://youtu.be/Zc_ufg4uW4U

00.40 setup
02:00 - dplyr::select
03:40 - dplyr::filter
05:05 - dplyr::mutate
07:05 - dplyr::summarise
08:30 - dplyr::arrange
09:55 - Combining these tools with the pipe (Setup for the Grammar of Data Manipulation)
11:45 - dplyr::group_by