El Salvador’s EHPM (Encuesta de Hogares Propositos Multiples)

library(tidyverse)
library(wesanderson)
library(ghibli)
library(GGally)
library(patchwork) 
library(RColorBrewer)
library(ggthemes)
library(ggplot2)
library(haven)
library(rnaturalearth)
# devtools::install_github("ropensci/rnaturalearthhires")
library(stringi)
library(sf)
library(patchwork)
library(dplyr)
library(waffle)

df <- read_csv("df_cleaned.csv")

# Adding poverty levels for later graphs
poverty_levels <- c("No poverty", "Relative poverty", "Extreme poverty")
df <- df %>%
  mutate(poverty = fct_relevel(poverty, poverty_levels))

El Salvador’s Encuesta de Hogares de Propositos Multiples (Multipurpose Household Survey) is carried out by the government’s Ministry of Economy and is used to provide information on the socioeconomic situation of Salvadoran households. The information captured by this survey informs planning and public policy actions intended to shape the country’s development.

The data set used in this project pulls data from the 2010-2019 EHPMs and was created by Katie Cox, Katie Mulder, and David Leblang, Ph.D. in order to assess the impact of climate change and socioeconomic variables on migration. As extreme weather-related hazards and pervasive drought occurs, agricultural workers experience the effects of climate shocks particularly hard, especially in Central American countries such as El Salvador, which is predicted to lose more than 35% of its suitable growing area by 2050 (Ovalle-Rivera, 2015). This generates economic hardships that pushes rural farmers to migrate to countries such as the United States in search of work.

The data used in this report is organized at the department level. Departments are similar to states in the United States but lack certain features, such as independent laws, legislatures, or government. There are 14 departments in El Salvador. This data visualization report primarily focuses on illustrating some of the descriptive relationships between certain categorical variables.

Visualizing Descriptive Variables

To begin with, let’s investigate how the number of households in El Salvador varies by department. Below you can find the number of households per department that appear in the data set. This data set is intended to be representative of general demographics in El Salvador. San Salvador, for example, is the most populated department.

s <- ggplot(df, aes(x = fct_rev(fct_infreq(department))))
s + geom_bar(fill = wes_palette("Chevalier1", n = 1, type = "discrete")) +
  labs(x = "Department Name",
       y = "Number of Households",
       title = "Number of Households From Each Department") +
  coord_flip()

In addition, the data set captures other variables, such as poverty. The EHPM survey classifies poverty levels according to the following scale: no poverty, relative poverty, and extreme poverty. The EHPM defines households in relative poverty as households that are unable to cover the cost of twice the value of the Canasta Básica Alimentaria (Basic Food Basket) with their per capita income. Households in extreme poverty are unable to cover the cost of the Basic Food Basket.

s <- ggplot(df, aes(x = fct_rev(fct_rev((fct_infreq((poverty)))))))
s + geom_bar(fill = wes_palette("Moonrise2", n = 1, type = "discrete")) +
  labs(x = "Poverty Level",
       y = "Count",
       title = "Number of Households According to Poverty Classification") 

When breaking down poverty levels by department, we gain better insights into which regions in El Salvador experience higher poverty rates on average. We can also visualize departments with higher rates of households that are involved in agriculture. Households in departments such as Morazan and Cabanas are more heavily involved in agriculture.

poverty_by_dept_graph <- df %>%
  ggplot(aes(x = department, fill = fct_rev(fct_infreq((poverty))))) +
  geom_bar(position = "fill") + 
  scale_fill_manual(values = wes_palette("Cavalcanti1", n = 3),
                    breaks = c("Extreme poverty", "Relative poverty", "No poverty"),
                  labels = c("Extreme", "Relative", "None")) +
  coord_flip() +
  labs(title = "Poverty Levels", 
       y = "% of Households in Department", 
       x = "") + 
  theme(legend.position = "bottom",
        axis.title = element_text(size = 9)) + 
  labs(fill = "") 

ag_graph <- df %>%
  filter(agriculture != "NA") %>%
  mutate(agriculture = factor(agriculture))

ag_by_dept_graph <- ag_graph %>%
  ggplot(aes(x = department, fill = (agriculture))) +
  geom_bar(position = "fill") + 
  scale_fill_manual(values = wes_palette("Cavalcanti1", n = 2),
                    breaks = c(TRUE, FALSE),
                  labels = c("Agricultural", "Non-Agricultural")) +
  coord_flip() +
  labs(title = "Agricultural Involvement", 
       y = "% of Households Involved in Agriculture", 
       x = "") + 
  theme(legend.position = "bottom",
        axis.title = element_text(size = 9)) + 
  coord_flip() +
  labs(fill = "") 

poverty_by_dept_graph + ag_by_dept_graph +
  plot_annotation(title = "Poverty Levels and Agricultural Involvement by Department") 

Using a graph that breaks down poverty levels by department but also differentiates between agricultural and non-agricultural households, we can see the relationship between agricultural involvement and poverty level visualized in a different way. In other words, we can still see that agricultural households experience higher rates of both relative and extreme poverty compared to non-agricultural households in each department.

One last way that we can visualize the relationship between these categorical variables is using a waffle plot. The graphs below depict the difference between the rate of poverty levels experienced by households that are and are not involved in agriculture.

ag_waffle <- waffle(ag/200,
       rows = 20,
       legend_pos = "right", 
       xlab = "1 square = 200 households", 
       title = "Agricultural", 
      colors = c("gold3", "darkgreen", "darkseagreen")) 
not_ag_waffle <- waffle(not_ag/450, 
       rows = 20, 
       legend_pos = "",
       xlab = "1 square = 450 households", 
       title = "Non-Agricultural", 
       colors = c("gold3", "darkgreen", "darkseagreen")) 

ag_waffle + not_ag_waffle

** Note that in the agricultural waffle plot, 1 square is equal to 200 households, while in the non-agricultural households waffle plot, 1 square is equal to 450 households. This is because there are roughly 52,000 agricultural households and roughly 118,000 non-agricultural households. Since 200 multiplied by 2.26 is roughly 450, using these numbers helps represent accurate proportions for each graph, respectively.

Conclusion and Key Takeaways

This document is intended to provide visual expressions of some of the key relationships in categorical data from the 2010-2019 EHPM survey data. In doing so, I hope to convey to the reader that rural agricultural households in El Salvador disproportionately experience rates of extreme poverty. This knowledge fits into the broader context of climate migration from El Salvador, as farming households that are unable to cope with extreme weather events face production losses that generate poverty and force them to search for work elsewhere.

While this report does not analyze migration data, it paves the way for the visualization and exploration of the relationships between agricultural activity, poverty, and the rate of migration out of the country in search of other economic opportunities.