About the Data and this Analysis

The dataset used in this analysis was collected by the DeepSolar Project at Stanford. DeepSolar is a deep learning framework that analyzes satellite imagery to identify the GPS locations and sizes of solar photovoltaic (PV) panels. Essentially, a machine-learning algorithm was trained to classify satellite imagery and identify solar photovoltaic systems (aka solar panels), both in residential and commercial contexts. The data were collected at the census-tract level for the contiguous 48 states of the U.S

The DeepSolar data were merged with other census-tract information to provide a rich dataset containing not only the number and area of solar panels per tract, but also demographic, income, policy, weather and more. The project website can be found at: http://web.stanford.edu/group/deepsolar/home

This analysis analyzes the spatial data on solar panel area in California and Virginia. I also explore broad correlates of solar panel take-up.

Variables Used in this Analysis

  • total panel area: total solar panel area (commercial + residential) in meters-squared (m^2)
  • total panel area per capita: total solar panel area (commercial + residential) in m^2 divided by population
  • share of 2016 presidential vote democratic: share of the presidential vote in 2016 for the democratic candidate
  • daily solar radiation: average daily solar radiation in kWh/m^2/day
  • total number of state incentives: total number of state-level solar incentives (commercial + residential), determined from the DSRIE database, found at: https://www.dsireusa.org/

Load Libraries, Download Data, and Perform Data Cleaning

This section of code creates a state-level dataset, as well as California and Virginia county-level datasets.

library(tidyverse)
library(sf) 
library(rnaturalearth) 
library(rnaturalearthdata) 
library(tigris)
library(rgeos)
library(plotly)
library(scales)
library(patchwork)

# Pull CSV from Project Website (uncomment if this is first time running code)
# download <- read_csv("http://web.stanford.edu/group/deepsolar/deepsolar_tract.csv")
# write_csv(download, "deepsolar_tract.csv")

#Load csv as solar dataframe
solar <- read_csv("deepsolar_tract.csv")

# CALIFORNIA ----
# County Data
solar_ca <- solar %>% 
  filter(state=="ca") %>% 
  mutate(county = factor(county))

solar_ca_cnty <- solar_ca %>% 
  mutate(fips_county = str_sub(fips, 1,4), 
         fips_county = paste0("0", fips_county)) %>% 
  group_by(county,fips_county) %>% # to keep both the name and the fips present in grouped data
  summarize (total_panel_area = sum(total_panel_area), # in m^2
             solar_system_count_nonresid = sum(solar_system_count_nonresidential), 
             solar_system_count_resid = sum(solar_system_count_residential), 
             total_panel_area_residential = sum(total_panel_area_residential), # in m^2
             solar_panel_area_per_capita = mean(solar_panel_area_per_capita), # in m^2/capita
             electricity_price_residential = mean(electricity_price_residential), #cents/kWh
             electricity_consume_residential = mean(electricity_consume_residential), #kWh
             household_income = mean(average_household_income, na.rm = TRUE),
             population_density = mean(population_density),
             high_school_grad = mean(education_high_school_graduate_rate),
             share_white = mean(race_white_rate),
             gini = mean(gini_index),
             population = mean(population),
             land_area = sum(land_area),
             daily_solar_radiation = mean(daily_solar_radiation),
             voting_2016_dem_percentage = mean(voting_2016_dem_percentage))

# Determine share of panels per land area share, all in miles squared
solar_ca_cnty <- solar_ca_cnty %>% 
  mutate(land_area_m2 = land_area*2589988.11) %>% # land area in meters-squared (was in miles^2)
  mutate(land_share_panel_area_resid = total_panel_area_residential/land_area,
         land_share_panel_area_total = total_panel_area/land_area)

# FIPS Data
solar_ca_fips <- solar_ca %>%
  mutate(fips = as.character(fips),
         fips= paste0("0", fips))

# Counties
ca_counties <- counties(state = 06, cb = TRUE)

# FIPS Tracts
ca_tracts <- tracts(state = 06, cb = TRUE)

# Join Data -- County Level
county_join_ca <- ca_counties %>% 
  left_join(solar_ca_cnty, by = c("GEOID" = "fips_county"))

# Join Data -- FIPS Tract Level
fips_join_ca <- ca_tracts %>% 
  left_join(solar_ca_fips, by = c("GEOID" = "fips"))

# VIRGINIA ----
# County Data
solar_va <- solar %>% 
  filter(state=="va") %>% 
  mutate(county = factor(county))

solar_va_cnty <- solar_va %>% 
  mutate(fips_county = str_sub(fips, 1,5)) %>% 
  group_by(county,fips_county) %>% # to keep both the name and the fips present in grouped data
  summarize (total_panel_area = sum(total_panel_area), # in m^2
             solar_system_count_nonresid = sum(solar_system_count_nonresidential), 
             solar_system_count_resid = sum(solar_system_count_residential), 
             total_panel_area_residential = sum(total_panel_area_residential), # in m^2
             solar_panel_area_per_capita = mean(solar_panel_area_per_capita), # in m^2/capita
             electricity_price_residential = mean(electricity_price_residential, na.rm = TRUE), #cents/kWh
             electricity_consume_residential = mean(electricity_consume_residential, na.rm = TRUE), #kWh
             household_income = mean(average_household_income),
             population_density = mean(population_density),
             high_school_grad = mean(education_high_school_graduate_rate),
             share_white = mean(race_white_rate),
             gini = mean(gini_index),
             population = mean(population),
             land_area = sum(land_area),
             daily_solar_radiation = mean(daily_solar_radiation))

# Determine share of panels per land area share, all in miles squared
solar_va_cnty <- solar_va_cnty %>% 
  mutate(land_area_m2 = land_area*2589988.11) %>% # land area in meters-squared (was in miles^2)
  mutate(land_share_panel_area_resid = total_panel_area_residential/land_area,
         land_share_panel_area_total = total_panel_area/land_area)

# FIPS Data
solar_va_fips <- solar_va %>%
  mutate(fips = as.character(fips))

# County Data
va_counties <- counties(state = 51, cb = TRUE)

# FIPS Tracts
va_tracts <- tracts(state = 51, cb = TRUE)

# Join Data -- County Level
county_join_va <- va_counties %>% 
  left_join(solar_va_cnty, by = c("GEOID" = "fips_county"))

# Join Data -- FIPS Tract Level
fips_join_va <- va_tracts %>% 
  left_join(solar_va_fips, by = c("GEOID" = "fips"))

# State Level Cleaning: U.S. Data ----
solar_state <- solar %>% 
  mutate(state = factor(state)) %>%
  group_by(state) %>% 
  summarize (incentive_nonresidential_state_level = mean(incentive_nonresidential_state_level),
             incentive_residential_state_level = mean(incentive_residential_state_level),
             feedin_tariff = mean(feedin_tariff),
             daily_solar_radiation = mean(daily_solar_radiation, na.rm = TRUE),
             avg_electricity_retail_rate = mean(avg_electricity_retail_rate),
             total_panel_area_residential = sum(total_panel_area_residential),
             total_panel_area = sum(total_panel_area),
             population = sum(population),
             total_area = sum(total_area))

solar_state <- solar_state %>%
  mutate(total_incentives = incentive_nonresidential_state_level 
         + incentive_nonresidential_state_level) %>% 
  mutate(total_panel_area_per_capita = total_panel_area/population) %>% 
  mutate(panel_area_divided_by_total_area = total_panel_area/total_area) %>% 
  mutate(higlight_ca_va = ifelse(state == "ca" | state == "va", 1, 0))

Solar Panel Proliferation Throughout the U.S.

The figure below shows the total solar panel area and total panel area per capita for each state in the U.S. It is evident that California is the national leader in solar panel development at large. California also ranks third in solar panel area per capita, with nearly 1 square-meter of solar panels for every California resident. Virginia’s solar panel development lags far behind California’s, and poorly among states in terms of solar panels per capita.

# Create Two Solar State geom_col() Plots
s1 <- solar_state %>% 
  ggplot(aes(x = fct_reorder(state, total_panel_area), y = total_panel_area, 
             fill = higlight_ca_va,
             text = paste(state, "\n Panel Area:", round(total_panel_area, digits = 0)))) + 
  geom_col(size = 2) +
  coord_flip() +
  geom_col(position = "dodge") +
  guides(fill = "none") +
  labs(title = "Solar Panel Proliferation Across U.S. States",
       x = "",
       y = "Total Solar Panel Area (m^2)") +
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none") +
    theme_minimal()
    
s2 <- solar_state %>% 
  ggplot(aes(x = fct_reorder(state, total_panel_area), y = total_panel_area_per_capita, 
             fill = higlight_ca_va,
             text = paste(state, "\n Panel Area per Capita:", round(total_panel_area_per_capita, digits = 3)))) + 
  geom_col(size = 2) +
  coord_flip() +
  geom_col(position = "dodge") +
  guides(fill = "none") +
  labs(x = "",
       y = "Total Solar Panel Area per Capita (m^2/person)") +
  theme(legend.position = "none") +
    theme_minimal()
fig1 <- ggplotly(s1, tooltip = "text")
fig2 <- ggplotly(s2, tooltip = "text")

subplot(fig1, fig2, titleX = TRUE, shareY = TRUE)

Mapping Solar Panel Area in California

Given California’s high level of solar panel area, we first focus on this state. The map below shows the distribution of solar panel area across counties in California.

ggplot(county_join_ca) +
  geom_sf(aes(fill = total_panel_area)) +
  scale_fill_viridis_c(option = "viridis", 
                       name = "Total Solar Panel Area (m^2)") +
  theme_void()

Correlates with Solar Panel Development in California

Here we explore several correlates with solar panel development in California.

solar_ca_cnty %>% 
  ggplot(aes(x = voting_2016_dem_percentage, y = solar_panel_area_per_capita)) + 
  geom_point(size = 2, color = "#0072B2") + 
  labs(title = "Does Solar Panel Proliferation Vary by Political Affiliation in California?",
       x = "Share of 2016 Presidential Vote Democratic",
       y = "Solar Panel Area per Capita") +
  geom_smooth(color="light grey", method = "lm", se=FALSE) +
  theme(plot.title = element_text(hjust = 0.5)) + 
  theme_minimal()

The plot above shows county-level data in California. There is a broad, positive relationship between how democratic a county is (measured by the share of the county that voted for the Democratic candidate in the 2016 presidential election), and its solar panel development. While this is not a causal relationship, as there are many other things correlated with both of these variables, it is consistent with a body of research indicating the impact of political and pro-environmental views on the likelihood of installing solar panels.

solar_ca_cnty %>% 
  ggplot(aes(x = daily_solar_radiation, y = solar_panel_area_per_capita)) + 
  geom_point(color = "navy") + 
  labs(title = "Is Solar Panel Development Correlated with \n Solar Radiation in California?",
       x = "Daily Average Solar Radiation (kWh/m^2/d) per County",
       y = "Solar Panel Area per Capita") +
  theme_minimal() +
  geom_smooth(color="light grey", method = "lm", se = FALSE) +
  theme(plot.title = element_text(hjust = 0.5)) 

The plot above shows that as the average daily solar radiation in a county increases, then, generally, solar panel proliferation increases. It makes sense, that weather conditions help determine the level of solar development in a given county; more sun = more solar panels!

Zooming Out: Policy Incentives and Level of Solar Development

The following figure highlights that states with higher levels of average daily solar radiation are more likely to have higher per capita concentrations of solar panels, which is consistent with the county-level story seen in California in the scatterplot above. However, this also shows that many states with higher than expected levels of solar development given their level of solar radiation tend to have more solar panel incentive policies. This is seen by the outliers tending to be larger (the size of points represents number of state-level policy incentives). In essence, sunniness matters, but so does policy!

fig_state <- solar_state %>%
  ggplot(aes(x = daily_solar_radiation, y = total_panel_area_per_capita, size = total_incentives,
             text = paste(state, "\n # Incentives:", total_incentives), alpha = 1/3)) + 
  geom_point(color = "dark green") + 
  labs(title = "Do Solar Panel Incentives or Sunshine Matter More?",
       x = "Daily Solar Radiation (kWh/m^2/d)",
       y = "Total Panel Area per Capita (m^2)") +
  guides(size=guide_legend("Total # of State Incentives")) +
  annotate("text", x= 5, y = 0.65, label = "Size of points reflects number\n of state-level solar incentives") +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme_minimal()

ggplotly(fig_state, tooltip = c("text"))

Where is Virginia in its Solar Development?

The map below shows county-level development of solar panels in Virginia. While Virginia lags behind other states in solar development, there is a high potential for success of this energy source.

ggplot(county_join_va) +
  geom_sf(aes(fill = total_panel_area)) +
  scale_fill_viridis_c(option = "viridis", 
                       name = "Total Solar Panel Area (m^2)") +
  theme_void()