3

I have the following sample data with three different cost-types and a year-column:

library(tidyverse)

# Sample data
costsA <- sample(100:200,30, replace=T)
costsB <- sample(100:140,30, replace=T)
costsC <- sample(20:20,30, replace=T)
year <- sample(c("2000", "2010", "2030"), 30, replace=T)
df <- data.frame(costsA, costsB, costsC, year)

My goal is to plot these costs in a stacked barplot, so that I can compare the mean-costs between the three year-categories. In order to do so I aggregated the values:

df %>% group_by(year) %>%
  summarise(n=n(),
            meanA = mean(costsA),
            meanB = mean(costsB),
            meanC = mean(costsC)) %>%
ggplot( ... ) + geom_bar()

But how can I plot the graph now? In the x-axis there should be the years and in the y-axis the stacked costs.

example

1
  • 1
    What you want to do is not quite clear to me but something like this?? df %>% group_by(year) %>% summarise(n=n(), meanA = mean(costsA), meanB = mean(costsB), meanC = mean(costsC)) %>% gather("key", "value", - c(year, n)) %>% ggplot(aes(x = year, y = value, group = key, fill = key)) + geom_bar(stat = "identity") Commented Feb 25, 2019 at 16:15

1 Answer 1

3

You have to make the summarise data into a tidy(-ish) format to generate a plot like the one you posted. In a tidy-verse, you'd do that with gather function where you convert multiple columns into two-columns of key-value pairs. For instance, the following code generates the figure below.

df %>% group_by(year) %>%
  summarise(n=n(),
            meanA = mean(costsA),
            meanB = mean(costsB),
            meanC = mean(costsC)) %>% 
  gather("key", "value", - c(year, n)) %>%
  ggplot(aes(x = year, y = value, group = key, fill = key)) + geom_col()

With gather("key", "value", - c(year, n)), three columns (costsA, costsB, costsC) are changed to the key-value pairs.

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

You may want to use geom_col here, geom_bar is proportional to the number of cases while geom_col is proportional to the values. The original question had $ as the y axis and not count.
Thanks @Dave2e. I believe that geom_col() is identical to geom_bar(stat = 'identity') but I tend to forget geom_col() exists.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.