0

I am using diamonds df,

I would like to plot a boxplot for each numerical column by category, In this case category would be defined by "cut" column.

I am using a for-loop to accomplish this task,

Here's the code I am using:


##################################################################################
#                              Data                                              #
#                                                                                #
##################################################################################

data("diamonds")
basePlot <- diamonds[ names(diamonds)[!names(diamonds) %in% c("color", "clarity")] ]

##################################################################################

## set Plot view to 4 boxplots ##
par(mfrow = c(2,2))

## for-loop to boxplot all numerical columns ##

for (i in 1:(ncol(basePlot)-1)){
  print(ggplot(basePlot, aes(as.factor(cut), 
  basePlot[c(i)],color=as.factor(cut)))
        + geom_boxplot(outlier.colour="black",outlier.shape=16,outlier.size=1,notch=FALSE)
        + xlab("Diamond Cut")
        + ylab(colnames(basePlot)[i])
  )
}


Console output:

Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error in is.finite(x) : default method not implemented for type 'list'

Is there any other way to accomplish this task?

2
  • Since we don't have your data, would you mind reframing your question in light of a public dataset, such as mtcars or diamonds? Also, have you considered facets or some other grouping mechanism as an alternative to producing multiple plots? Commented Nov 3, 2020 at 0:08
  • 1
    hey @r2evans, already edited so it could a reproducible example; haven't use facets Commented Nov 3, 2020 at 0:27

1 Answer 1

1

Instead of multiple plots, I suggest facets. To do this, though, we need to convert the data from "wide" format to "longer" format, and the canonical way in the tidyverse is with tidyr::pivot_longer.

> basePlot
# A tibble: 53,940 x 8
   carat cut       depth table price     x     y     z
   <dbl> <ord>     <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1 0.23  Ideal      61.5    55   326  3.95  3.98  2.43
 2 0.21  Premium    59.8    61   326  3.89  3.84  2.31
 3 0.23  Good       56.9    65   327  4.05  4.07  2.31
 4 0.290 Premium    62.4    58   334  4.2   4.23  2.63
 5 0.31  Good       63.3    58   335  4.34  4.35  2.75
 6 0.24  Very Good  62.8    57   336  3.94  3.96  2.48
 7 0.24  Very Good  62.3    57   336  3.95  3.98  2.47
 8 0.26  Very Good  61.9    55   337  4.07  4.11  2.53
 9 0.22  Fair       65.1    61   337  3.87  3.78  2.49
10 0.23  Very Good  59.4    61   338  4     4.05  2.39
# ... with 53,930 more rows
> pivot_longer(basePlot, -cut, names_to="var", values_to="val")
# A tibble: 377,580 x 3
   cut     var      val
   <ord>   <chr>  <dbl>
 1 Ideal   carat   0.23
 2 Ideal   depth  61.5 
 3 Ideal   table  55   
 4 Ideal   price 326   
 5 Ideal   x       3.95
 6 Ideal   y       3.98
 7 Ideal   z       2.43
 8 Premium carat   0.21
 9 Premium depth  59.8 
10 Premium table  61   
# ... with 377,570 more rows

With this, we only have to tell ggplot2 to worry about val for the values, and var for the x-axis.

library(ggplot2)
library(tidyr) # pivot_longer

ggplot(pivot_longer(basePlot, -cut, names_to="var", values_to="val"),
       aes(cut, val, color=cut)) +
  geom_boxplot(outlier.colour="black", outlier.shape=16, outlier.size=1, notch=FALSE) +
  xlab("Diamond Cut") +
  facet_wrap(~var, nrow=2, scales="free") +
  scale_x_discrete(guide=guide_axis(n.dodge=2))

ggplot2, faceted boxplots

The reason you have cut both in the x-axis and in the legend is because color= will add the legend. Since it's redundant, we could either remove the color aesthetic (which would also remove the legend) or we could just suppress the legend (by adding + scale_color_discrete(guide=FALSE)).

There are two ways of faceting: facet_wrap and facet_grid. The latter is well tuned for multiple variables (one facet variable on the x, one on the y) and many other configurations. Granted, you can use facet_grid with just one variable (which is similar to facet_wrap(nrow=1) or ncol=1), but there are some styling distinctions between them.

Sign up to request clarification or add additional context in comments.

4 Comments

this is great however how could I adjust the view to just 2 "var" (say carat and price in one view for example) by Plot?, in the original dataset I have 82 columns...
For a view of "so many", limit with something like basePlot[basePlot$cut %in% c("Fair","Good"),].
is it possible to use that kind of filter but for carat, depth, price, etc... maybe inside facet_wrap?
Do you mean something like pivot_longer(...) %>% filter(var %in% c("caret", "depth", "price")) %>% ggplot(...) + geom_boxplot(...) + facet_wrap(~var, ...)?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.