0

I'm looking to create a set of box plots where I create a bloxplot for each variable in sampledf1 against the single variable in sampledf2.

The actual use case is I've created a set of clusters with k-means and now want to see their distribution for each of the found clusters with each variable in the dataframe I'm using for clustering.

sampledf1 <- as.data.frame(replicate(6, sample(c(1:10,NA))))
sampledf2 <- as.data.frame(replicate(1, sample(c(21:30,NA))))

Then I want to see a box plot with each of the variables in sampledf1 paired with the only variable in sampledf2.

I would like to use something like:

sapply(boxplot(sampledf1~sampledf2$V1))

but this gives me this error:

Error in match.fun(FUN) : argument "FUN" is missing, with no default

Anyway I could do this would dplyr would be great but I didn't see any functions that I could chain together to do this.

4 Answers 4

3

Here's a way using lapply and seq_along. We iterate through the columns of sampledf1 using seq_along. We can extract the variable names using our index, i, and the names function.

par(mfrow = c(2,3))
lapply(seq_along(sampledf1), 
       FUN  = function(i) 
           boxplot(sampledf1[,i] ~ sampledf2$V1, main = names(sampledf1)[i])
       )

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

2

You can use ggplot and facets, if you first reshape your data into long format

library(reshape2)
library(ggplot2)
s.all = cbind(sampledf1, f2=sampledf2$V1)
s.long = melt(s.all, id = 'f2')
ggplot(s.long) +
  geom_boxplot(aes(x=f2, group=f2, y=value)) +
  facet_wrap(~variable) +
  scale_x_continuous(breaks=unique(s.long$f2))

enter image description here

1 Comment

Thanks for sharing this as it's good to have more than one way to solve a problem and I'm learning plot.
1

library(purrr)'s walk works nicely when you start trying to pass formulas like this. walk() works like sapply, iterating over the elements in an object, just with more flexible syntax. The . refers to the iterated element from names(sampledf1).

This will work to get each panel named by the column in sampledf1 it represents:

library(purrr)    
par(mfrow = c(2,3))
purrr::walk(names(sampledf1), ~boxplot(sampledf1[,.]~sampledf2$V1, main = .))

enter image description here

2 Comments

Thanks Nathan, is there anyway to add labels so I know which data each box plot has? Or is by order of variables in sampledf1 if I were to do a names(sampledf1)?
you're correct about the order it process in. you don't really need map here but it is the same function with different output, each useful depending on the scenario. I will edit in a labeled version
0

ggplot2 variant:

library(reshape2)
library(ggplot2)

sampledf1$X <- sampledf2$V1
ggplot(melt(sampledf1, id.vars="X", na.rm=T), aes(factor(X),value)) + 
  geom_boxplot() + facet_wrap( ~ variable, nrow=2)

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.