The data was found from a research article by Haakenstad et al., (2022) https://doi.org/10.1016/S0140-6736(22)00936-9. The authors state that the data is a collection from 1162 surveys measuring womens current contraception use ages 15-49, between 1970 to 2019 in 204 countries. For this project, contracpetive use in Europe between 1991 and 2019 was investigated. 15 contraceptive methods were measured if more than one method was used then the most effective method was counted.
The data is shared via Global Burden of Diease Collaboration Network.
Reference: Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2019 (GBD 2019) Contraceptive Prevalence Estimates 1970-2019. Seattle, United States of America: Institute for Health Metrics and Evaluation (IHME), 2022.
measure | location_id | location_name | sex_id | sex_name | age_group_id | age_group_name | metric | year_id | mean | upper | lower |
---|---|---|---|---|---|---|---|---|---|---|---|
Any contraceptive prevalence | 6 | China | 2 | Female | 8 | 15-19 years | Proportion of all women | 1990 | 0.009782 | 0.01619 | 0.005603 |
Any contraceptive prevalence | 6 | China | 2 | Female | 9 | 20-24 years | Proportion of all women | 1990 | 0.274000 | 0.34940 | 0.198900 |
Any contraceptive prevalence | 6 | China | 2 | Female | 10 | 25-29 years | Proportion of all women | 1990 | 0.749900 | 0.83490 | 0.632400 |
Any contraceptive prevalence | 6 | China | 2 | Female | 11 | 30-34 years | Proportion of all women | 1990 | 0.881200 | 0.93290 | 0.801500 |
Any contraceptive prevalence | 6 | China | 2 | Female | 12 | 35-39 years | Proportion of all women | 1990 | 0.898400 | 0.94930 | 0.816100 |
Any contraceptive prevalence | 6 | China | 2 | Female | 13 | 40-44 years | Proportion of all women | 1990 | 0.871400 | 0.93190 | 0.776600 |
This question comes from the apparent dissatisfaction of hormonal contraception among women, so it could be interesting to see if trends in use have changed. For this question all ages in the dataset are included in the visualisation. However, only European countries are included.
#cleaning up the data
#removing the rows that break down age groups as we only need to see combined age groups
#to remove certain rows based on number in a column ie: group id use the dplyr function
#make a new data frame called filtered data
# %in% checks if the value of the given values in the vector are in the column name
# the ! symbol means 'not in' so removes the rows according values specified in the column specified...
#It returns true for values not in the vector and false for those in the vector
#using a pipe to make the code more readable, if not it would look like...
# filtered_data <- filter(womendata, !age_gorup_id %in% c(8, 9, 10, 11, 12, 13, 14, 27)) (27 is getting rid of age standardised)
filtered_data <- womendata %>% filter(!age_group_id %in% c(8, 9, 10, 11, 12, 13, 14, 27))
view(filtered_data)
# currently, the mean column is in a decimal format.
#for the end graph to look nicer, it should be as a percentage.
# giving the values the % sign now would convert the numbers into character data, so you can add the % when visualising the graph
Lets check that worked
## [1] "now we have 110160 cases"
#to filter the data so that only certain countries are included in the graph, in order to make it cleaner
# first, turn the countries into continents to make filtering quicker
#install required packages
if(!require(countrycode)){install.packages("countrycode"); library(countrycode)}
filtered_data$Continent <- countrycode(sourcevar = filtered_data$location_name,
origin = "country.name",
destination = "continent")
view(filtered_data)
Lets check that worked by printing out the unique words in the continent column
## [1] "Asia" "Oceania" "Europe" "Americas" "Africa" NA
#filter the dataset by continent, in this case, just europe
filtered_data_europe <- filtered_data %>%
filter(Continent %in% c("Europe"))
view(filtered_data_europe)
Lets check that has worked by seeing how many rows include Europe vs the other Continents
## [1] "now we have 22680 cases"
#to check if europe is there
europe_rows <- filtered_data_europe[filtered_data_europe$Continent %in% c("Europe"), ]
fit_table(head(europe_rows))
measure | location_id | location_name | sex_id | sex_name | age_group_id | age_group_name | metric | year_id | mean | upper | lower | Continent |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Any contraceptive prevalence | 43 | Albania | 2 | Female | 24 | All age (15-49 years) | Proportion of all women | 1990 | 0.3686 | 0.4257 | 0.3166 | Europe |
Any contraceptive prevalence | 43 | Albania | 2 | Female | 24 | All age (15-49 years) | Proportion of all women | 1991 | 0.3736 | 0.4275 | 0.3231 | Europe |
Any contraceptive prevalence | 43 | Albania | 2 | Female | 24 | All age (15-49 years) | Proportion of all women | 1992 | 0.3778 | 0.4296 | 0.3284 | Europe |
Any contraceptive prevalence | 43 | Albania | 2 | Female | 24 | All age (15-49 years) | Proportion of all women | 1993 | 0.3824 | 0.4316 | 0.3345 | Europe |
Any contraceptive prevalence | 43 | Albania | 2 | Female | 24 | All age (15-49 years) | Proportion of all women | 1994 | 0.3872 | 0.4343 | 0.3392 | Europe |
Any contraceptive prevalence | 43 | Albania | 2 | Female | 24 | All age (15-49 years) | Proportion of all women | 1995 | 0.3925 | 0.4381 | 0.3468 | Europe |
#to check if other continents are removed from the dataset
other_rows <- filtered_data_europe[filtered_data_europe$Continent %in% c("Asia", "Africa", "Americas", "Oceania"), ]
print(other_rows)
## # A tibble: 0 × 13
## # ℹ 13 variables: measure <chr>, location_id <dbl>, location_name <chr>,
## # sex_id <dbl>, sex_name <chr>, age_group_id <dbl>, age_group_name <chr>,
## # metric <chr>, year_id <dbl>, mean <dbl>, upper <dbl>, lower <dbl>,
## # Continent <chr>
# Filter the dataset to include only the relevant columns
#the select function selects the relevant columns to retain
filtered_data_2 <- filtered_data_europe %>%
select(year_id, measure, `mean`,location_name)
view(filtered_data_2)
# Group by Year and Contraception_Type and calculate the mean Value
#group by groups the year_id and measure (contraceptive type) which creates a row with the same year and type of contraception
#summarise calculates the mean of the mean column for each group as created by the group_by() function
mean_data <- filtered_data_2 %>%
group_by(year_id, measure) %>%
summarise(mean_value = mean(`mean`, na.rm = TRUE))
view(mean_data)
year_id | measure | mean_value |
---|---|---|
1990 | Any contraceptive prevalence | 0.5535810 |
1990 | Condom prevalence | 0.0998419 |
1990 | Demand satisfied with modern methods | 0.6870286 |
1990 | Diaphragm prevalence | 0.0048849 |
1990 | Emergency contraception prevalence | 0.0020933 |
1990 | Female sterilization prevalence | 0.0359986 |
To get a sense of the data, all variables in the measure column are included
first_p <- ggplot(mean_data, aes(x = year_id, y = mean_value, color = measure)) +
geom_line() +
labs(title = "Mean Contraceptive Use Over Time",
x = "Year",
y = "Mean Contraceptive Use") +
theme_minimal()
print(first_p)
#specify the file
plot_file <- here("plots", "first_plot.png")
# Saveing the plot to the specified file
ggsave(filename = plot_file, plot = first_p, device = "png", bg = "white", width = 8, height = 6, units = "in")
# bg white is to set the background white as its default is black
In order for the plot to be meaningful, choosing specific measures would be best.
Measures chosen are:
#lets define the theme for the different texts in the graph, choosing serif
#by coding this as 'theme_text' the code for the graph will be cleaner and easier to modify
#element_text() specifies font families
theme_text <- theme(
#most text will me Arial regular
text = element_text(family = "serif"),
#main title should be bold
plot.title = element_text(family = "serif" , face = "bold", size = 14),
#subtitle should be italics
plot.subtitle = element_text(family = "serif" , face = "italic"),
#the legend title should also be bold
legend.title = element_text(family = "serif", face = "bold", size = 14),
legend.text = element_text(size = 14),
)
For fun, the blue chosen here is the RGB value of IKB 79 by Yves Klein
Again, for fun the pink chosen here is from the Rigevidon (the pill) packaging The webiste used to calculate the rgba value is:https://imagecolorpicker.com/en
# for the pink used is from this website https://imagecolorpicker.com/en to determine the RBG values
# turn rgba(189,35,87,255) into something R can read
# this is the pink that is used on the Rigevidon (contraceptive pill) packaging
pill_colour <- rgb(239,44,86, maxColorValue = 255) # maxColorValue sets the maximum value for each component
#this changes the size of the legend to make it clearer
#the legend may be a bit repetitive with the word prevalence reoccurring
custom_labels <- c(
"Condom prevalence" = "Condoms",
"Pill prevalence" = "The Pill",
"Implants prevalence" = "Implants",
"Injections prevalence" = "Injections",
"IUD prevalence" = "IUD"
)
# the scales package here converts the y axis into a percentage
# by changing the range between 1991 and 2019 the graph does not travel off of its limits
#unfortunately between 1990 and 2019 is 29, which is a prime number
p <- ggplot(filter(mean_data, measure %in% c("Condom prevalence", "Pill prevalence", "Implants prevalence", "Injections prevalence", "IUD prevalence")),
mapping= aes(x = year_id, y = mean_value, color = measure), frame = year_id) + #frame = is so that the graph can be a gif
geom_line() +
geom_point(size = 3, shape = 20) +
labs(title = "Percentage of Contraceptive Use ",
subtitle= "Between 1991 and 2019 in Europe, women aged between 15 and 49. ",
x = "Year",
y = "Mean Contraceptive Use",
color = "Contraceptive Type") + # this is to add a title to the legend
scale_y_continuous(labels = scales::percent)+
scale_x_continuous(limits = c(1991, 2019), breaks = seq(1991, 2019, by = 4)) + # Set x-axis limits and specify breaks every 4 years
scale_color_manual(values = custom_colors,
labels = custom_labels) +
theme_minimal() +
theme_text
print(p)
By animating the plot the time aspect is accentuated
#install the relevant package
if(!require(gganimate)){install.packages("gganimate"); library(gganimate)}
if(!require(gifski)){install.packages("gifski"); library(gifski)}
#animating the plot
animated <- p+
geom_point() +
transition_manual(year_id, cumulative = TRUE) +
ggtitle(expression('Percentage of Contraceptive Use in the year: {ifelse(frame + 1990 <= 2019, frame + 1990, 2019)}'))
#the next lines of code will allow this markdown script to be knit into both PDF and html
#the use of 'if' will determine what code is run based on whether the knit is to html or pdf
if(knitr::is_html_output()){
animated
}
This graph may suggest that hormonal contraception use has not gone down drastically despite possible dissatisfaction with it. It does however show that condom use is increasing a lot. This could mean a lot of things, perhaps people are trying to be safer, it could mean that younger age populations, who are starting to engage in sexual activity more, need a form of contraception but don’t want to use hormonal ones. Of course, a limitation of this plot/the data used, is that it does not investigate whether people are happy with their contraception. In wider context, this data could not inform researchers/medical professionals if women are unhappy with their hormonal contraception. But it can show that, despite side effects, many women stay on hormonal contraception, with the pill being a favourable option.
This data set can lend itself to many visualisations. For example, plotting the different age categories can give insight into favourable contraceptive methods for different age groups, could younger generations be using less or more hormonal contraception? and if so, which one appears to be popular? Looking into different continents or specific countries can also be done as a follow-up, it is possible different countries favour different methods. There are a wide range of measures in the data set and thus, many other measures can be investigated, not just the 5 chosen out of pure curiosity.
If given more time, I would have liked to merge both of the available data sets to investigate since the 70’s. I did not do this because the one data set used was large, and frankly was scary! However, I have learnt a lot doing this project, I am very proud of my visualisation and the fun things that I incorporated like the colour of the pill packaging. I thought it was ambitious to make a moving graph at the start of the module and so I am proud to have completed one.
Haakenstad, A., Angelino, O., Irvine, C. M. S., Bhutta, Z. A., Bienhoff, K., Bintz, C., Causey, K., Dirac, M. A., Fullman, N., Gakidou, E., Glucksman, T., Hay, S. I., Henry, N. J., Martopullo, I., Mokdad, A. H., Mumford, J. E., Lim, S. S., Murray, C. J. L., & Lozano, R. (2022). Measuring contraceptive method mix, prevalence, and demand satisfied by age and marital status in 204 countries and territories, 1970–2019: a systematic analysis for the Global Burden of Disease Study 2019. The Lancet, 400(10348), 295–327. https://doi.org/10.1016/S0140-6736(22)00936-9
The data, code and original code book can be found via this website: https://ghdx.healthdata.org/record/ihme-data/contraceptive-prevalence-estimates-1970-2019 Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2019 (GBD 2019) Contraceptive Prevalence Estimates 1970-2019. Seattle, United States of America: Institute for Health Metrics and Evaluation (IHME), 2022.