0

I'm Trying to create new dataframes from dplyr 0.4.3 functions using R 3.2.2.

What I want to do is create some new dataframes using dplyr::filter to separate out data from one ginormous dataframe into a bunch of smaller dataframes.

For my reproducible base case bog simple example, I used this:

filter(mtcars, cyl == 4)

I know I need to assign that to a dataframe of its own, so I started with:

paste("Cylinders:", x, sep = "") <- filter(mtcars, cyl == 4))

That didn't work -- it gave me the error found here: Assignment Expands to Non-Language Object

From there, I found this: Create A Variable Name with Paste in R

(also, big ups to the authors of the above)

And that led me to this, which works:

assign(paste("gears_cars_cylinders", 4, sep = "_"), filter(mtcars, cyl == 4)) %>% 
    group_by(gear) %>% 
    summarise(number_of_cars = n())

and by "works," I mean I get a dataframe named gears_cars_cylinders_4 with all the goodies from

filter(mtcars, cyl == 4) %>% 
        group_by(gear) %>% 
        summarise(number_of_cars = n())

But ultimately, I think I need to wrap this whole thing in a function and be able to feed it the cylinder numbers from mtcars$cyl. I'm thinking something like plyr::ldply(mtcars$cyl, function_name)?

In my real-life data, I have about 70 different classes I need to split out into separate dataframes to drop into DT::datatable tabs in Shiny, which is a whole nuther mess. Anyway.

When I try this:

    function_name <- function(x){
    assign(paste("gears_cars_cylinders", x, sep = "_"), filter(mtcars, cyl == x)) %>% 
        group_by(gear) %>% 
        summarise(number_of_cars = n())
}

and then function_name(6),

I get the output of the dataframe to the screen, but not a dataframe with the name.

Am I looking right over the answer here?

0

1 Answer 1

5

You need to assign the new data frames into the environment from which you're calling function_name(). Try something like this:

library(dplyr)

foo <- function(x) {
  assign(paste("gears_cars_cylinders", x, sep = "_"),
         envir = parent.frame(),
         value = mtcars %>% 
           filter(cyl == x) %>% 
           count(gear))
}

for(cyl in sort(unique(mtcars$cyl))) foo(cyl)
ls()
#> [1] "cyl"                    "foo"                   
#> [3] "gears_cars_cylinders_4" "gears_cars_cylinders_6"
#> [5] "gears_cars_cylinders_8"
gears_cars_cylinders_4
#> Source: local data frame [3 x 2]
#> 
#>    gear     n
#>   (dbl) (int)
#> 1     3     1
#> 2     4     8
#> 3     5     2
Sign up to request clarification or add additional context in comments.

3 Comments

I can't help but feel this goes against everything I've been taught in R, in terms of grouping similar data in structures like lists. And if the point of dplyr is to simplify things, mashing it together with assign and environment manipulation seems like overkill. gears <- by(mtcars, mtcars$cyl, FUN=function(x) data.frame(table(x$gear)) ) and then accessing like gears[["4"]] seems so much less error-prone.
Yes that would normally be my thinking too. But maybe there are circumstances where you actually need these data frames as separate objects?
I honestly can't think of an occasion when that would be necessary. If you have the data.frames of cyl4 cyl6 cyl18 floating about in the GlobalEnv then you need to loop over a pasted together vector of "cyl" and i in c(4,6,18) and use get() to retrieve them, when you could just do gears[[i]]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.