1 Data origins

The data was found from a research article by Haakenstad et al., (2022) https://doi.org/10.1016/S0140-6736(22)00936-9. The authors state that the data is a collection from 1162 surveys measuring womens current contraception use ages 15-49, between 1970 to 2019 in 204 countries. For this project, contracpetive use in Europe between 1991 and 2019 was investigated. 15 contraceptive methods were measured if more than one method was used then the most effective method was counted.

The data is shared via Global Burden of Diease Collaboration Network.

Reference: Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2019 (GBD 2019) Contraceptive Prevalence Estimates 1970-2019. Seattle, United States of America: Institute for Health Metrics and Evaluation (IHME), 2022.

1.0.0.1 Variables important to this project include

  • Measure –> this is everything that was measured including contraceptive types and demand satisfied with modern methods
  • Mean –> this specifies the mean value for each measure for each year and country. More is detailed in the code book.
  • age_group_name –> this divides the data into different age groups, this visualisation uses All age (15-49)
  • Location_name –> the country for the estimate, this includes 204 countries
  • Continent –> continent in which the country belongs to. Values include Asia, Europe, Oceania, Africa, Americas
  • Mean value –> This variable combines the original mean values for each year, each contraception, and each country. To create an overall mean for each year and contraception which is then represented as a percentage.

1.0.0.2 Here is the first few lines of the data

measure location_id location_name sex_id sex_name age_group_id age_group_name metric year_id mean upper lower
Any contraceptive prevalence 6 China 2 Female 8 15-19 years Proportion of all women 1990 0.009782 0.01619 0.005603
Any contraceptive prevalence 6 China 2 Female 9 20-24 years Proportion of all women 1990 0.274000 0.34940 0.198900
Any contraceptive prevalence 6 China 2 Female 10 25-29 years Proportion of all women 1990 0.749900 0.83490 0.632400
Any contraceptive prevalence 6 China 2 Female 11 30-34 years Proportion of all women 1990 0.881200 0.93290 0.801500
Any contraceptive prevalence 6 China 2 Female 12 35-39 years Proportion of all women 1990 0.898400 0.94930 0.816100
Any contraceptive prevalence 6 China 2 Female 13 40-44 years Proportion of all women 1990 0.871400 0.93190 0.776600

2 Research Question

3 Data Preperation

3.0.1 First, remove the age categories not needed, so only the all ages category is left

#cleaning up the data
#removing the rows that break down age groups as we only need to see combined age groups
#to remove certain rows based on number in a column ie: group id use the dplyr function 
#make a new data frame called filtered data
# %in% checks if the value of the given values in the vector are in the column name
# the ! symbol means 'not in' so removes the rows according values specified in the column specified... 
#It returns true for values not in the vector and false for those in the vector
#using a pipe to make the code more readable, if not it would look like...
# filtered_data <-  filter(womendata, !age_gorup_id %in% c(8, 9, 10, 11, 12, 13, 14, 27)) (27 is getting rid of age standardised)
filtered_data <- womendata %>% filter(!age_group_id %in% c(8, 9, 10, 11, 12, 13, 14, 27))
view(filtered_data)

# currently, the mean column is in a decimal format.
#for the end graph to look nicer, it should be as a percentage.
# giving the values the % sign now would convert the numbers into character data, so you can add the % when visualising the graph

Lets check that worked

print(paste ("now we have", nrow(filtered_data), "cases"))
## [1] "now we have 110160 cases"

3.0.2 Second, Categorising the countries into continents

#to filter the data so that only certain countries are included in the graph, in order to make it cleaner
# first, turn the countries into continents to make filtering quicker
#install required packages
if(!require(countrycode)){install.packages("countrycode"); library(countrycode)}
filtered_data$Continent <- countrycode(sourcevar = filtered_data$location_name,
                                    origin = "country.name",
                                    destination = "continent")
view(filtered_data)

Lets check that worked by printing out the unique words in the continent column

continent_words <- unique(filtered_data$Continent)
print(continent_words)
## [1] "Asia"     "Oceania"  "Europe"   "Americas" "Africa"   NA

3.0.3 Third, remove the continents not needed for the visualisation

#filter the dataset by continent, in this case, just europe
filtered_data_europe <- filtered_data %>%
  filter(Continent %in% c("Europe"))
view(filtered_data_europe)

Lets check that has worked by seeing how many rows include Europe vs the other Continents

#data check 
print(paste ("now we have", nrow(filtered_data_europe), "cases"))
## [1] "now we have 22680 cases"
#to check if europe is there
europe_rows <- filtered_data_europe[filtered_data_europe$Continent %in% c("Europe"), ]
fit_table(head(europe_rows))
measure location_id location_name sex_id sex_name age_group_id age_group_name metric year_id mean upper lower Continent
Any contraceptive prevalence 43 Albania 2 Female 24 All age (15-49 years) Proportion of all women 1990 0.3686 0.4257 0.3166 Europe
Any contraceptive prevalence 43 Albania 2 Female 24 All age (15-49 years) Proportion of all women 1991 0.3736 0.4275 0.3231 Europe
Any contraceptive prevalence 43 Albania 2 Female 24 All age (15-49 years) Proportion of all women 1992 0.3778 0.4296 0.3284 Europe
Any contraceptive prevalence 43 Albania 2 Female 24 All age (15-49 years) Proportion of all women 1993 0.3824 0.4316 0.3345 Europe
Any contraceptive prevalence 43 Albania 2 Female 24 All age (15-49 years) Proportion of all women 1994 0.3872 0.4343 0.3392 Europe
Any contraceptive prevalence 43 Albania 2 Female 24 All age (15-49 years) Proportion of all women 1995 0.3925 0.4381 0.3468 Europe
#to check if other continents are removed from the dataset
other_rows <- filtered_data_europe[filtered_data_europe$Continent %in% c("Asia", "Africa", "Americas", "Oceania"), ]
print(other_rows)
## # A tibble: 0 × 13
## # ℹ 13 variables: measure <chr>, location_id <dbl>, location_name <chr>,
## #   sex_id <dbl>, sex_name <chr>, age_group_id <dbl>, age_group_name <chr>,
## #   metric <chr>, year_id <dbl>, mean <dbl>, upper <dbl>, lower <dbl>,
## #   Continent <chr>

3.0.4 Now means need to be made for each measure and year regardless of the country

#  Filter the dataset to include only the relevant columns
#the select function selects the relevant columns to retain
filtered_data_2 <- filtered_data_europe %>%
  select(year_id, measure, `mean`,location_name)
view(filtered_data_2)
#  Group by Year and Contraception_Type and calculate the mean Value
#group by groups the year_id and measure (contraceptive type) which creates a row with the same year and type of contraception
#summarise calculates the mean of the mean column for each group as created by the group_by() function
mean_data <- filtered_data_2 %>%
  group_by(year_id, measure) %>%
  summarise(mean_value = mean(`mean`, na.rm = TRUE))  
view(mean_data)

3.0.5 Heres the first few lines of the filtered data

fit_table(head(mean_data))
year_id measure mean_value
1990 Any contraceptive prevalence 0.5535810
1990 Condom prevalence 0.0998419
1990 Demand satisfied with modern methods 0.6870286
1990 Diaphragm prevalence 0.0048849
1990 Emergency contraception prevalence 0.0020933
1990 Female sterilization prevalence 0.0359986
#install openxlsx package
#save the filtered data set
if(!require(openxlsx)){install.packages("openxlsx"); library(openxlsx)}
write.xlsx(mean_data, file.path("filtered_data", "mean_data.xlsx"))

4 Visualisations

4.1 Inital visualisation

To get a sense of the data, all variables in the measure column are included

first_p <- ggplot(mean_data, aes(x = year_id, y = mean_value, color = measure)) +
  geom_line() +
  labs(title = "Mean Contraceptive Use Over Time",
       x = "Year",
       y = "Mean Contraceptive Use") +
  theme_minimal()
print(first_p)

#specify the file 
plot_file <- here("plots", "first_plot.png")

# Saveing the plot to the specified file
ggsave(filename = plot_file, plot = first_p, device = "png", bg = "white", width = 8, height = 6, units = "in")
# bg white is to set the background white as its default is black

4.1.1 Thats messy! And doesn’t tell much about the data

In order for the plot to be meaningful, choosing specific measures would be best.

4.2 Final visualisation

Measures chosen are:

  • Condom prevalence
  • Pill prevalence
  • Implants prevalence
  • Injections prevalence
  • IUD prevalence

4.2.1 This Visualisation should be creative and therefore needs some customising.

4.2.1.1 In line with academia a serif font would go nicely, defining the text theme now will keep the plot code less messy later on

#lets define the theme for the different texts in the graph, choosing serif
#by coding this as 'theme_text' the code for the graph will be cleaner and easier to modify
#element_text() specifies font families
theme_text <- theme(
  #most text will me Arial regular
  text = element_text(family = "serif"),
  #main title should be bold
  plot.title = element_text(family = "serif" , face = "bold", size = 14),
  #subtitle should be italics
  plot.subtitle = element_text(family = "serif" , face = "italic"),
  #the legend title should also be bold
  legend.title = element_text(family = "serif", face = "bold", size = 14),
 
  legend.text = element_text(size = 14),
)

4.2.1.2 The colours should be nice too!

For fun, the blue chosen here is the RGB value of IKB 79 by Yves Klein

ikb <- "#002FA7" 

IKB_79

Again, for fun the pink chosen here is from the Rigevidon (the pill) packaging The webiste used to calculate the rgba value is:https://imagecolorpicker.com/en

# for the pink used is from this website https://imagecolorpicker.com/en to determine the RBG values 
# turn rgba(189,35,87,255) into something R can read
# this is the pink that is used on the Rigevidon (contraceptive pill) packaging
pill_colour <- rgb(239,44,86, maxColorValue = 255) # maxColorValue sets the maximum value for each component
the_pill_pic.jpg
the_pill_pic.jpg
custom_colors <- c(
  "Condom prevalence" = ikb, 
  "Pill prevalence" = pill_colour,
  "Implants prevalence" = "green", 
  "Injections prevalence" = "orange", 
  "IUD prevalence" = "pink"
  
)

4.2.1.3 The legend should be clear

#this changes the size of the legend to make it clearer
#the legend may be a bit repetitive with the word prevalence reoccurring
custom_labels <- c(
  "Condom prevalence" = "Condoms",
  "Pill prevalence" = "The Pill",
  "Implants prevalence" = "Implants",
  "Injections prevalence" = "Injections",
  "IUD prevalence" = "IUD"
)

4.2.2 The plot

# the scales package here converts the y axis into a percentage 
# by changing the range between 1991 and 2019 the graph does not travel off of its limits
#unfortunately between 1990 and 2019 is 29, which is a prime number 
p <- ggplot(filter(mean_data, measure %in% c("Condom prevalence", "Pill prevalence", "Implants prevalence", "Injections prevalence", "IUD prevalence")),
       mapping= aes(x = year_id, y = mean_value, color = measure), frame = year_id) + #frame = is so that the graph can be a gif
  geom_line() +
  geom_point(size = 3, shape = 20) +
  labs(title = "Percentage of Contraceptive Use ",
       subtitle= "Between 1991 and 2019 in Europe, women aged between 15 and 49. ",
       x = "Year",
       y = "Mean Contraceptive Use",
       color = "Contraceptive Type") + # this is to add a title to the legend
  scale_y_continuous(labels = scales::percent)+
  scale_x_continuous(limits = c(1991, 2019), breaks = seq(1991, 2019, by = 4)) +  # Set x-axis limits and specify breaks every 4 years
  scale_color_manual(values = custom_colors,
                     labels = custom_labels) +
                     
  theme_minimal() +
  theme_text
print(p)

#specify the file 
plot_file <- here("plots", "viz_230170335.png")

# Saveing the plot to the specified file
ggsave(filename = plot_file, plot = p, device = "png", bg = "white", width = 8, height = 6, units = "in")
# bg white is to set the background white as its default is black

5 The plot could be better by animating it- Plot for assessment

By animating the plot the time aspect is accentuated

#install the relevant package
if(!require(gganimate)){install.packages("gganimate"); library(gganimate)}
if(!require(gifski)){install.packages("gifski"); library(gifski)}
#animating the plot
animated <- p+
  geom_point() +
  transition_manual(year_id, cumulative = TRUE) +
  ggtitle(expression('Percentage of Contraceptive Use in the year: {ifelse(frame + 1990 <= 2019, frame + 1990, 2019)}'))
#the next lines of code will allow this markdown script to be knit into both PDF and html 
#the use of 'if' will determine what code is run based on whether the knit is to html or pdf
if(knitr::is_html_output()){
  animated
}

if(knitr::is_html_output()){
  anim_save(here("plots" , "gif_230170335.gif"), animated, renderer = gifski_renderer())

}

if(!knitr::is_latex_output()){
  anim_save(here("plots" , "gif_230170335.gif"), animated, renderer = gifski_renderer())
}

6 Summary

6.1 Interpretation

This graph may suggest that hormonal contraception use has not gone down drastically despite possible dissatisfaction with it. It does however show that condom use is increasing a lot. This could mean a lot of things, perhaps people are trying to be safer, it could mean that younger age populations, who are starting to engage in sexual activity more, need a form of contraception but don’t want to use hormonal ones. Of course, a limitation of this plot/the data used, is that it does not investigate whether people are happy with their contraception. In wider context, this data could not inform researchers/medical professionals if women are unhappy with their hormonal contraception. But it can show that, despite side effects, many women stay on hormonal contraception, with the pill being a favourable option.

6.2 Follow ups

This data set can lend itself to many visualisations. For example, plotting the different age categories can give insight into favourable contraceptive methods for different age groups, could younger generations be using less or more hormonal contraception? and if so, which one appears to be popular? Looking into different continents or specific countries can also be done as a follow-up, it is possible different countries favour different methods. There are a wide range of measures in the data set and thus, many other measures can be investigated, not just the 5 chosen out of pure curiosity.

6.3 Reflection

If given more time, I would have liked to merge both of the available data sets to investigate since the 70’s. I did not do this because the one data set used was large, and frankly was scary! However, I have learnt a lot doing this project, I am very proud of my visualisation and the fun things that I incorporated like the colour of the pill packaging. I thought it was ambitious to make a moving graph at the start of the module and so I am proud to have completed one.

7 References:

Haakenstad, A., Angelino, O., Irvine, C. M. S., Bhutta, Z. A., Bienhoff, K., Bintz, C., Causey, K., Dirac, M. A., Fullman, N., Gakidou, E., Glucksman, T., Hay, S. I., Henry, N. J., Martopullo, I., Mokdad, A. H., Mumford, J. E., Lim, S. S., Murray, C. J. L., & Lozano, R. (2022). Measuring contraceptive method mix, prevalence, and demand satisfied by age and marital status in 204 countries and territories, 1970–2019: a systematic analysis for the Global Burden of Disease Study 2019. The Lancet, 400(10348), 295–327. https://doi.org/10.1016/S0140-6736(22)00936-9

The data, code and original code book can be found via this website: https://ghdx.healthdata.org/record/ihme-data/contraceptive-prevalence-estimates-1970-2019 Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2019 (GBD 2019) Contraceptive Prevalence Estimates 1970-2019. Seattle, United States of America: Institute for Health Metrics and Evaluation (IHME), 2022.