0

I am new to using ggplot2 and i am having trouble plotting a graph. I have looked around on SO but the solutions I found did not work with my data. Here is an example of my DF

Count1     Count2      Color 
  3         4          Red
  3         6          Green 
  5         2          Red
  2         0          Blue 

I would like to just plot this is as a bar graph. I would like the X axis to consist of the colors and I would like to plot both the Count1 and Count2 variables on the y axis. for example, the two bars used to show the green color will go up to the number 3 (for count1) and the number 6 (for count2). Similarly, the red bar will go up to 8 (for count1) and 6 (for count2) Does anyone know how to go about doing this? Thanks!

3
  • 1
    Aggregate and reshape your data first, e.g. library(tidyverse); df %>% group_by(Color) %>% summarise_all(sum) %>% gather(var, val, -Color) %>% ggplot(aes(x = Color, y = val, fill = var)) + geom_col(position = 'dodge') Commented Apr 19, 2017 at 0:51
  • @alistaire when I tried the summarise_all part of your code on my real data, I got this error message and im not sure what to make of it: Error: invalid 'type' (character) of argument Commented Apr 19, 2017 at 18:15
  • summarise_all tries to apply its function (sum here) to all non-grouping columns, which for the sample data is fine, but may not be for your real data if there are non-numeric columns. (Also use str to make sure your numeric columns are actually stored as doubles or integers.) summarise_at lets you specify columns, or just use summarise and explicitly specify what to do with each column you want. Commented Apr 19, 2017 at 21:28

1 Answer 1

2

Breaking up the answer from @alistaire above, so you can follow what's going on

Your data

color_df <- data.frame(Count1 = c(3,3,5,2), Count2 = c(4,6,2,0), Color = c("Red", "Green", "Red", "Blue"))

Adding up counts for each color

library(dplyr)
sum_df <- color_df %>%
    group_by(Color) %>%
    summarise_all(sum)
sum_df

ggplot needs both counts in one column, with another column describing which is which. Compare sum_df and tidy_df

library(tidyr)
tidy_df <- sum_df %>%
    gather(CountName, Count, -Color)

Finally the plot. Dodge puts them side by side. geom_col uses heights from Count variable. geom_bar would count number of observations itself.

library(ggplot2)
ggplot(tidy_df, aes(x = Color, fill = CountName, y = Count)) +
    geom_col(position = "dodge")
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Jeremy. However, when I tried the group_by and summarise_all part of your code on my real data, I got this error message and im not sure what to make of it: Error: invalid 'type' (character) of argument
Check your counts are stored as numbers.
The likely problem is the sum. Do you have other columns apart from the counts? If so trying to sum a character column would give this error. Try replacing summarise_all with summarise (Count1 = sum(Count1), Count2 = sum(Count2))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.