04 - Sad Plot Made Better
Jun 10, 2018
Saundra Schlesinger
5 minute read
library(gapminder)
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.2     v dplyr   0.7.5
## v tidyr   0.8.1     v stringr 1.3.1
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts --------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dplyr)
library(ggplot2)
library(readr)
  1. Gapminder Challenge Pick at least two of the tasks below from the task menu and approach each with a table and figure.
  2. Get the maximum and minimum of GDP per capita for all continents.
#Table
max_min_gdpPercap_byCont <- gapminder %>%
  group_by(continent) %>% 
  summarize(min_gdpPercap_byCont = min(gdpPercap), max_gdpPercap_byCont = max(gdpPercap))
max_min_gdpPercap_byCont
## # A tibble: 5 x 3
##   continent min_gdpPercap_byCont max_gdpPercap_byCont
##   <fct>                    <dbl>                <dbl>
## 1 Africa                    241.               21951.
## 2 Americas                 1202.               42952.
## 3 Asia                      331               113523.
## 4 Europe                    974.               49357.
## 5 Oceania                 10040.               34435.

Here I show the overall maximum and overall minimum gdp per cap each continent has ever experienced.

#tidy table to fewer columns
max_min_gdpPercap_byCont_tidy<- max_min_gdpPercap_byCont %>%
  gather(minORmax, minmax_gdpPerCap, min_gdpPercap_byCont:max_gdpPercap_byCont, factor_key = TRUE) %>%
  arrange(continent)
max_min_gdpPercap_byCont_tidy
## # A tibble: 10 x 3
##    continent minORmax             minmax_gdpPerCap
##    <fct>     <fct>                           <dbl>
##  1 Africa    min_gdpPercap_byCont             241.
##  2 Africa    max_gdpPercap_byCont           21951.
##  3 Americas  min_gdpPercap_byCont            1202.
##  4 Americas  max_gdpPercap_byCont           42952.
##  5 Asia      min_gdpPercap_byCont             331 
##  6 Asia      max_gdpPercap_byCont          113523.
##  7 Europe    min_gdpPercap_byCont             974.
##  8 Europe    max_gdpPercap_byCont           49357.
##  9 Oceania   min_gdpPercap_byCont           10040.
## 10 Oceania   max_gdpPercap_byCont           34435.
  1. Look at the spread of GDP per capita across countries within the continents.
#Table
max_min_gdpPercap_countriesInCont <- gapminder %>%
  group_by(continent, year) %>%
  filter(gdpPercap==min(gdpPercap) | gdpPercap==max(gdpPercap)) %>%
  arrange(year, continent) %>%
  mutate(country, minmax=rank(gdpPercap)) %>%
  mutate(minmax = as.factor(minmax)) %>%
  select(-country, -pop, -lifeExp) # remove country column
max_min_gdpPercap_countriesInCont
## # A tibble: 120 x 4
## # Groups:   continent, year [60]
##    continent  year gdpPercap minmax
##    <fct>     <int>     <dbl> <fct> 
##  1 Africa     1952      299. 1     
##  2 Africa     1952     4725. 2     
##  3 Americas   1952     1398. 1     
##  4 Americas   1952    13990. 2     
##  5 Asia       1952   108382. 2     
##  6 Asia       1952      331  1     
##  7 Europe     1952      974. 1     
##  8 Europe     1952    14734. 2     
##  9 Oceania    1952    10040. 1     
## 10 Oceania    1952    10557. 2     
## # ... with 110 more rows

Here I show the minimum and maximum gdp per capita per continent by year.

#change data so that have cont/year/mingdp/maxgdp
max_min_gdpPercap_byCont <- max_min_gdpPercap_countriesInCont %>%
  spread(minmax, gdpPercap, sep = "_") #sep adds in previous column name
max_min_gdpPercap_byCont
## # A tibble: 60 x 4
## # Groups:   continent, year [60]
##    continent  year minmax_1 minmax_2
##    <fct>     <int>    <dbl>    <dbl>
##  1 Africa     1952     299.    4725.
##  2 Africa     1957     336.    5487.
##  3 Africa     1962     355.    6757.
##  4 Africa     1967     413.   18773.
##  5 Africa     1972     464.   21011.
##  6 Africa     1977     502.   21951.
##  7 Africa     1982     462.   17364.
##  8 Africa     1987     390.   11864.
##  9 Africa     1992     411.   13522.
## 10 Africa     1997     312.   14723.
## # ... with 50 more rows
#Plot
ggplot(max_min_gdpPercap_byCont) +
  geom_ribbon(aes(x = year, ymin = minmax_1, ymax = minmax_2, fill = continent), alpha = 0.05) + #creates shading between min and max
  geom_line(aes(x = year, y = minmax_1, color = continent)) + #min line
  geom_line(aes(x = year, y = minmax_2, color = continent)) + #max line
  scale_y_log10() +
  labs(title = "Minimum and Maximum GDP Over Time by Country", x = "Year", y = "GDP (log10)") +
  theme(
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white"),
    axis.title = element_text(face="bold"), 
    title = element_text(face="bold"), 
    axis.text = element_text(size = 10), 
    panel.grid.major.y = element_blank(), 
    panel.grid.major.x = element_blank(), 
    panel.grid.minor.y = element_blank(), 
    panel.grid.minor.x = element_blank(), 
    axis.line.x = element_line(color = "gray30", size = .5), 
    axis.line.y = element_line(color = "gray30", size = .5), 
    axis.ticks = element_line(color = "gray30", size = 1)
    )

So while I was able to achieve my goal (learn to use ribbon plotting), the visualization actually turned out quite poor! I thought there would be a lot more disparity between continents and they would separate more easily. Because that is not the case, I would not say this is an effective visualization and another method would be more useful.

  1. Type of graph: This is a varaition of a stacked area graph that is sometimes called a shaded line graph.

  2. Description of data: These data are from gapminder, a resource that tracks a huge number of global socioeconomic variables over time. For my purposes I focused on GDP, country, and time.

  3. Description of audience: This audience may be a layperson casually interested in global socioeconomic changes.

  4. Representation Description: What I was trying to show was the disparity between max/min per continent over time, as well as absolute min/max between continents. WHile it technically does show that, the data are not appropriate for this kind of graph apparently and the visualization actually isn’t very useful!

  5. How to read it & What to look for: I would first look for the biggest disparities, and then compare that to other continents. If you can see it, I think it’s quite interesting that Asia has an enormous spread between max and min that remains over time. It also until relatively recently had the largest max and was on-par with Africa for lowest min. Oceania, on the other hand, has had a tight relationship between its max and min, although that has spread over time (and it likely in part due to the small number of countries represented).

  6. Presentation tips: These are categorical data, so a generic, non-heirarchal palette was used. The scale of log10 was chosen because the disparity between continents meant that to see the highest performers, all the other values were squished together at the bottom.

  7. Variations and alternatives: I would definitely use an alternative. These data have too much overlap to make a meaningful statement using this methodology. A better alternative would likely be to separate out separate questions and present them separately. A bar graph would probably do well.

  8. Methods: For this graph I plotted the max and min GDP per continent by year, and then shaded the region between the max and min. The hope was to be able to better see the max and min per continent (when it’s two discrete, inlinked lines in a mass of lines it was difficult), as well as highlight the disparity between max and min over time.