--- title: "Problem Set 3" author: "Michele Claibourn" date: "11/02/2021" output: html_document: toc: true toc_float: true code_folding: hide theme: sandstone --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE) ``` # World Bank Development Indicators Read in the WDI data we downloaded in ps3_prep.R and have a peek. ```{r load} library(tidyverse) library(GGally) library(wesanderson) library(viridis) library(patchwork) # add other needed libraries here wdi <- read_csv("data/wdi_clean.csv") glimpse(wdi) ``` Region and income are characters, let's have a look. ```{r check} wdi %>% count(region) wdi %>% count(income) ``` ## 1. Data Prep Make both region and income factors and set the levels of income in a reasonable way. Then make a bar graph of each (individually), choosing colors with intention (e.g., not the defaults). ```{r prep} wdi <- wdi %>% mutate(region = factor(region), income = factor(income, levels = c("Low income", "Lower middle income", "Upper middle income", "High income"))) ggplot(wdi, aes(x = fct_infreq(region))) + geom_bar(fill = wes_palette("FantasticFox1")[3]) + labs(x = "", y = "Number of Observations") + coord_flip() ggplot(wdi, aes(x = income, fill = income)) + geom_bar() + scale_fill_manual(values = wes_palette("FantasticFox1"), name = "Income Group") + labs(x = "", y = "Number of Observations", title = "Country Observations by Income Group") ``` ## 2. Overview of Relationships Make EITHER a scatterplot matrix or a correlogram to visualize the relationships between the key quantitative variables in this data set (that is, the three life expectancy variables, access to electricity, gdp per capita, CO2 emissions, and the percent in extreme poverty). What stands out? Correlogram version ```{r corr} wdi %>% filter(year == 2018) %>% select(-c(iso2c, country, year, region, income)) %>% ggcorr() ``` Scatterplot matrix version ```{r matrix} wdi %>% filter(year == 2018) %>% select(-c(iso2c, country, year, region, income)) %>% ggpairs() ``` ## 3. Scatterplots! 1. Pick any year (between 2008 and 2018) and any two quantitative variables from above to focus on. Using just the data for your chosen year, make an initial scatterplot of your selected variables. 2. Try multiple additional versions, including (separately or simultaneously, it's up to you) - color the points by income - facet by region the figure by region - add a size aesthetic mapped to another quantitative variable ```{r scatter} wdi %>% filter(year == 2018) %>% ggplot(aes(x = co2_emissions, y = life_expectancy)) + geom_point() # color by income; wdi %>% filter(year == 2018) %>% ggplot(aes(x = co2_emissions, y = life_expectancy, color = income)) + geom_point() # facet by region; wdi %>% filter(year == 2018) %>% ggplot(aes(x = co2_emissions, y = life_expectancy, color = income)) + geom_point() + facet_wrap(~ region) # bubble by another quant wdi %>% filter(year == 2018) %>% ggplot(aes(x = co2_emissions, y = life_expectancy, color = income, size = gdp_per_cap)) + geom_point(pch = 21) ``` 3. Based on the experimentation above, generate one final scatterplot you find the most useful; choose the colors for this final figure intentionally (e.g., not the defaults). ```{r finalscatter} # bubble by another quant wdi %>% filter(year == 2018) %>% ggplot(aes(x = co2_emissions, y = life_expectancy, color = income, size = gdp_per_cap)) + geom_point(pch = 21) + scale_color_viridis(option = "turbo", discrete = TRUE, name = "Income Groupe") + guides(size = "none") + labs(x = "CO2 Emissions", y = "Life Expectancy") ``` ## 4. Change! 1. Using just the county-level data in 2008 and 2018, create EITHER a slope graph or a dumbbell plot to visualize the change/difference in overall life expectancy across these two periods. ```{r slope} # slope graph wdi %>% filter(year %in% c(2008, 2018)) %>% ggplot(aes(x = year, y = life_expectancy)) + geom_line(aes(group = country), color = "gray60") + geom_point(color = "#0072B2") ``` 2. Recreate the above graph for female life expectancy and for male life expectancy. For these, choose the colors with intention. Save each in a named object and use `patchwork` to plot them together. ```{r dumbbell, fig.height = 10, fig.width = 5} # dumbbell plot male <- wdi %>% filter(year %in% c(2008, 2018), !is.na(life_expectancy_male)) %>% ggplot(aes(x= life_expectancy_male, y = fct_reorder(country, life_expectancy_male))) + geom_line(aes(group = country), color = "grey")+ geom_point(aes(color = as.factor(year))) + scale_color_manual(values = c("navy", "orange")) + guides(color = guide_legend("Year")) + labs(y = "", x = "Life Expectancy among Males") + theme_minimal() + theme(axis.text.y = element_text(size = 7), legend.position = "bottom") male female <- wdi %>% filter(year %in% c(2008, 2018), !is.na(life_expectancy_female)) %>% ggplot(aes(x= life_expectancy_female, y = fct_reorder(country, life_expectancy_female))) + geom_line(aes(group = country), color = "grey")+ geom_point(aes(color = as.factor(year))) + scale_color_manual(values = c("navy", "orange")) + guides(color = guide_legend("Year")) + labs(y = "", x = "Life Expectancy among Females") + theme_minimal() + theme(axis.text.y = element_text(size = 7), legend.position = "bottom") female ``` ```{r patch, fig.height = 10, fig.width = 10} male + female ```