Problem set questions?

Artwork by @allison_horst

Artwork by @allison_horst

Bad Viz Examples

To the slack!

Color

Scales

Color used to distinguish groups requires a qualitative color scale that is

  • finite and unordered
  • readily distinguished
  • approximately equivalent

Color used to representing values or comparative magnitude requires a sequential color scheme that

  • uses a many-valued gradient to distinguish larger/smaller values
  • represets the distance between values
  • may be single-hued, multi-hued, diverging

Color to highlight a group or threshold value requires accent colors that

  • stands out/pops relative to the rest of the colors
  • may be a single color against grey backdrop
  • may be baed on intensity of colors in color scale

Pitfalls

  1. Encoding too much information (e.g., too many groups)
    • Wilke suggests qualitative scales work best with 3 to 5 groups and work poorly beyond 8 groups
    • Labeling points is an alternative
  2. Coloring for the sake of coloring
    • And using oversaturated colors
  3. Using non-monotonic scales for values (e.g., the rainbow scale)
  4. Ignoring accessibility (e.g., color perception)

Wrangling

ggplot expects tidy data, data that is structured such that

  • Each variable has its own column
  • Each observation has its own row
  • Each value has its own cell
Wickham and Grolemund Ch 12

Wickham and Grolemund Ch 12

Pivot

pivot_longer: Convert wide data to long, or move variable values out of the column names and into the cells.

pivot_longer(df, cols = -country, names_to = "year", values_to = "cases")

pivot_wider: Convert long data to wide, or move variable names out of the cells and into the column names.

pivot_wider(df, id_cols = country, names_from = type, values_from = count)

Separate/Unite

separate: Split a single column into multiple columns by separating each cell in the column into a row of cells.

separate(df, col = rate, into = c("cases", "pop"), sep = "/")

unite: Combine several columns into a single column by uniting their values across rows.

unite(df, col = year, century:year, sep = "")

Joins

Joins merge data sets based on key variables. The syntax is always name_join(x, y, by = "key")

Animated visuals created by Garrick Aden-Buie

  • full_join(): keeps all observations in x and y

  • left_join(): keeps all observations in x

  • right_join(): keeps all observations in y

  • inner_join(): keeps observations in both x and y

ggplot2

To the Script!

Patchwork

A ggplot composer that makes it “ridiculously easy” to arrange multiple plots into a single figure!

R Markdown

R Markdown creates dynamic documents by combining markdown (an easy to write plain text format) with embedded R code chunks. When compiled, the code can be evaluated so that the code, its output, and your prose can be included in the final document to make reports reproducible.

  • R Markdown documents (.Rmd files) can be rendered to multiple formats including HTML and PDF.
  • The R code in an .Rmd document is processed by knitr, while the resulting .md file is rendered by pandoc to the final output formats (e.g. HTML or PDF).

R Markdown files contain

  • A YAML header (yet-another-markup-language), offset by —-
  • Text with markdown formatting
  • Chunks of R code, offset by ``` (keyboard shortcut: Cmd/Ctrl + Alt + I)

Additional Resources

XKCD Inspiration

XKCD, Randall Munroe, https://xkcd.com/2048/