1

I have a data frame for observation numbers (3 observations for same id), height, weight and fev that looks like this (just for example):

id      obs     height  weight       fev
1         1        160     80         90
1         2        150     70         85
1         3        155     76         87
2         1        140     67         91
2         2        189     78         71
2         3        178     86         89

I need to plot this data using ggplot2 such that on x-axis there are 3 variables height, weight, fev; and the observation numbers are displayed as 3 vertical lines for each variable (color coded), where each lines show a median as a solid circle, and 25th and 75th percentiles as caps at the upper and lower extremes of the line (no minimum or maximum needed). I have so far tried many variations of box plots but I am not even getting close. Any suggestion(s) how to approach or solve this?

Thanks

2 Answers 2

1

OK instead what I did below was make three graphs then piece together with gridExtra. Read more about package here: http://www.sthda.com/english/wiki/wiki.php?id_contents=7930

I took the common legend code from this site to produce the following, starting with our existing longdf2. By piecing together the graphs, the information about corresponding observation is within the title of the graph

id <- rep(1:12, each = 3)
obs <- rep(1:3, 12)
height <- seq(140,189, length.out =  36)
weight <- seq(67,86, length.out = 36)
fev <- seq(71,91, length.out = 36)

df <- as.data.frame(cbind(id,obs,height, weight, fev))

obsonly <- melt(df, id.vars = c('id'), measure.vars = 'obs')

obsonly <- rbind(obsonly,obsonly,obsonly)

newvars <- melt(df[-2],id.vars = 'id')

longdf2 <- cbind(obsonly,newvars)

longdf2 <- longdf2[-4] #dropping second id column

colnames(longdf2)[c(2:5)] <- c('obs', 'obsnum', 'variable', 'value')

#Make graph 1 of observation 1

g1 <- longdf2 %>%
  dplyr::filter(obsnum == 1) %>%
  ggplot(aes(x = variable, y = value, color = variable)) + 
    stat_summary(fun.data=median_hilow) +
      labs(title = "Observation 1") +
       theme(plot.title = element_text(hjust = 0.5)) #has a legend

g2 <- longdf2 %>%
dplyr::filter(obsnum == 2) %>%
ggplot(aes(x = variable, y = value, color = variable)) + 
  stat_summary(fun.data=median_hilow) +
    labs(title = "Observation 2") +
     theme(plot.title = element_text(hjust = 0.5), legend.position = 
        'none')
    #specified as none to make common legend at end

g3 <- longdf2 %>%
   dplyr::filter(obsnum == 3) %>%
   ggplot(aes(x = variable, y = value, color = variable)) + 
     stat_summary(fun.data=median_hilow) +
      labs(title = "Observation 3") +
      theme(plot.title = element_text(hjust = 0.5), legend.position = 
      'none')


library(gridExtra)
get_legend<-function(myggplot){
 tmp <- ggplot_gtable(ggplot_build(myggplot))
 leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
 legend <- tmp$grobs[[leg]]
 return(legend)
    }


# Save legend

legend <- get_legend(g1)


# Remove legend from 1st graph

g1 <- g1 + theme(legend.position = 'none')

# Combine graphs

grid.arrange(g1, g2, g3, legend, ncol=4, widths=c(2.3, 2.3, 2.3, 0.8))

Plenty of other little tweaks you could make along the way

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for going the extra mile to make sure this was well solved. Really appreciate it.
0

Try putting the data into long format prior to graphing. I generated some more data, 12 subjects, each with 3 observations.

id <- rep(1:12, each = 3)
obs <- rep(1:3, 12)
height <- seq(140,189, length.out =  36)
weight <- seq(67,86, length.out = 36)
fev <- seq(71,91, length.out = 36)

df <- as.data.frame(cbind(id,obs,height, weight, fev))


library(reshape2) #use to melt data from wide to long format

longdf <- melt(df,id.vars = c('id', 'obs')) 

Don't need to define measure variables here since the id.vars are defined, the remaining non-id.vars automatically default to measure variables. If you have more variables in your data set, you'll want to define measure variables in that same line as: measure.vars = c("height,"weight","fev")

longdf <- melt(df,id.vars = c('id', 'obs'), measure.vars = c("height", "weight", "fev"))

Apologies, haven't earned enough votes to put figures into my responses

ggplot(data = longdf, aes(x = variable, y = value, fill = factor(obs))) + 
geom_boxplot(notch = T, notchwidth = .25, width = .25, position = position_dodge(.5))

This does not produce the exact graph you described-- which sounded like it was geom_linerange or something similar? -- those geoms require an x, ymin, and ymax to draw. Otherwise a regular, 'ole boxplot has your 1st and 3rd IQRs and median marked. I adjusted parameters of the boxplot to make it thinner with notches and widths, and separated them slightly with the position_dodge(.5)

after reading your response, I edited my original answer

You could try facet_wrap -- and watch the exchanging of "fill" vs. "color" in ggplot. If an object can't be "filled" with a color, like a boxplot or distribution, then it has to be "colored" with a color. Use color instead in the original aes()

ggplot(data = longdf, aes(x = variable, y = value, color = factor(obs))) + 
stat_summary(fun.data=median_hilow) + facet_wrap(.~obs)

This gives you observation 1 - height, weight, fev side by side, observation 2- height, ....

If that still isn't what you want perhaps more like height observation 1,2,3; weight observation 1,2,3...then you'll need to modify your melting to have two variable and two value columns. Essentially make two melted dataframes, then cbind. Annnnd because each observation has three variables, you'll need to rbind to make sure both data frames have the same number of rows:

 obsonly <- melt(df, id.vars = c('id'), measure.vars = 'obs')

 obsonly <- rbind(obsonly,obsonly,obsonly) #making rows equal 

 longvars <- melt(df[-2],id.vars = 'id') #dropping obs from melt

 longdf2 <- cbind(obsonly,longvars)

 longdf2 <- longdf2[-4] #dropping second id column


 colnames(longdf2)[c(2:5)] <- c('obs', 'obsnum', 'variable', 'value')

 ggplot(data = longdf2, aes(x = obsnum, y = value, 
         color = factor(variable))) + 
         stat_summary(fun.data=median_hilow) +
         facet_wrap(.~variable)

From here you can play around with the x axis marks (probably isn't useful to have a 1.5 observation marked) and the spacing of the lines from each other

2 Comments

Thanks much for taking the time to explain at great length and to give enough to think in the right direction. What you generated was absolutely in the right direction. And building on your explanation, I was able to take it one step closer to what I want using the command: ggplot(data = longdf, aes(x = variable, y = value, fill = factor(obs))) + stat_summary(fun.data=median_hilow); however, doing this is overlapping all 3 plots for obs 1,2 and 3 on the same line. What I am hoping to do from here is to display all 3 lines corresponding to each obs for each variable.
Thanks for your edited response, and in prompt manner. The first option you suggested gets me even closer to what I want. I banged my head around it to come up with what I exactly wanted, but to no avail. Could there be a possible way that the 3 lines (obs 1, 2, and 3) are displayed next to each other for the same variable? Like the one displayed with boxplot option you suggested, but with median hilow lines instead of the box plot? With double melts option, I still want the x axis to display variable name so I didn't pursue it further. Much thanks once again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.