Introduction

The Milk with Dignity program is a worker’s rights operation that works with dairy farm workers in Vermont. The program has several mechanisms in place to ensure that worker’s human and labor rights are not violated by their employer. One of these features is the use of an anonymous, bilingual hotline that workers can call at any point to report complaints, violations, or concerns. Once a call is completed, the complaint is investigated for validity and source of harm.

Complaints can fall within several different categories:

  • Health and Safety
  • Schedules, Rest, and Leisure
  • Wages and Related Issues
  • Housing
  • Harassment, Discrimination, and Bad Treatment
  • Just Cause
  • Milk with Dignity Program Premium
  • Complaint Mechanism and Protection from Retaliation
  • Transparency, Cooperation, and Third Party Auditing
  • Other

Health and Safety, Schedules, Rest, and Leisure, Wages and Related Issues, Housing, and Harassment, Discrimination, and Bad Treatment have been identified as complaint categories of particular interest, as these labor violations are among some of the most concerning.

This project will evaluate several basic statistics of the various complaint types, their relationship with one another, and also their intersection with the amount of time an investigation takes.

library(readxl)
library(tidyverse)
library(GGally)
library(stringr)
library(forcats)
library(skimr)
#install.packages("palettetown")
library(palettetown)
#install.packages("ComplexUpset")
library(ComplexUpset)
#hotline<-read_excel("/Users/maya/Documents1/5th Year Classes/Data Visualization/Final Project/Hotline_Data_Clean (mls6ud@virginia.edu).xls")
hotline<-read_excel("Hotline_Data_Clean.xls")

Complaint Type Frequencies

#hotline_long %>% 
 # count(comptype)

hotline_long$comptype<- str_wrap(hotline_long$comptype, width = 30)

comp_count<- hotline_long %>% 
  group_by(comptype) %>% 
  summarize(num=n())

ggplot(hotline_long, aes(x=fct_infreq(comptype), fill=comptype))+
  geom_bar()+
  geom_text(data=comp_count, aes(x=comptype, y=num+5, label = num), color="black", hjust=0, size=3)+
  coord_flip()+
  guides(fill=FALSE)+
  scale_fill_poke(pokemon="farfetch'd")+
  labs(title = "Complaint Type Frequency",x="", y="")+
  expand_limits(y=c(0,130))+
  theme_minimal()

We see that Health and Safety is the most common complaint category, followed closely by Wages, then by Schedules, Rest, and Leisure, and Housing.

Investigation Length Summary Statistics

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    0.00    1.00   17.84    6.00  869.00      25

Most complaint investigations only take about 1 day to complete. Our mean is pulled upwards by outliers, making it much higher at 17.84 days.

colnames(hotline)[85:89]<-c("Wages", "Health", "Scheduling", "Discrimination","Housing")
colnames(hotline)[84]<-"Invest"

hotline %>% 
  select(Wages, Health, Scheduling, Discrimination,Housing, Invest) %>% 
  ggcorr(label = TRUE, label_alpha = TRUE)

#### 
hotline_long<-hotline_long %>% 
  group_by(comptype) %>% 
  mutate(median_inv_comp=median(invest_days, na.rm=TRUE))

ggplot(hotline_long, aes(x=comptype, y=median_inv_comp, color=comptype))+
  geom_point(size=3, color="darkseagreen4")+
  geom_hline(yintercept=1, color="red", alpha=0.4)+
  coord_flip()+
  labs(y="Median Days an Investigation Takes", x="")

There are no super strong correlations between a certain type of complaint and the number of days an investigation takes.

Complaint Mechanism and Protection from Retaliation complaint calls take the longest to investigate, at about 7 days as the median. This is followed by Harassment, Discrimination, and Bad Treatment at just over 3.5 days.

Co-occurrence of Complaint Types

colnames(hotline)[85:89]<-c("Wages", "Health", "Scheduling", "Discrimination","Housing")

short_comptypes<-c("Wages", "Health", "Scheduling", "Discrimination","Housing")

# colnames(hotline[85:89]) <- c("Wages", "Health", "Scheduling", "Treatment", "Housing")
# 
# rename(hotline, Wages=comp_wage)
# rename(hotline, Health=comp_hlth)
# rename(hotline, Scheduling=comp_sched)
# rename(hotline, Treatment=comp_disc)
# rename(hotline, Housing=comp_house)
# 
# complaint_types <- c("Wages", "Scheduling", "Housing", "Health", "Treatment")


upset(data = hotline, intersect = short_comptypes, 
      min_size = 0,
      width_ratio = 0.125) +
    labs(title = "Co-Occurence of Complaint Types")

The graph above has several different features. First, the top bar chart shows how many observations there are of the associated combination of complaints marked via the dots immediately below. Dots that are connected are co-occurring. Thus, we see that there are 100 calls in which none of the listed complaints are mentioned.

Co-occurrence itself is also relatively rare, with all of our complaints occurring most frequently in a call alone. The most common type of co-occurrence is with Scheduling and Wage related complaints, at 17 observations.

The chart labeled “Set Size” on the left connects to the complaint immediately adjacent to it and shows how often those complaints occur throughout the entire dataset, alone or in conjunction with other complaints.