0

I wish to have some advice on this problem in R. I have a data frame "my_fruits_data" with many columns including the index columns as below in name_cols. I want to filter those index columns one by one with a for loop and store the filtered records in respective data frames with their names listed in df_fruits for post-processing. Apparently, it doesn't work as df_fruits elements are strings rather than actual data frame names. I've searched and got a few hints but none of them actually helped.

# column names
name_cols <- c("Index_apple",  
             "Index_pear",
             "Index_orange",  
             "Index_watermelon",
             "Index_strawberry"
         )
# dataframe names for filtered result 
df_fruits <- c("df_apple",  
             "df_pear",
             "df_orange",  
             "df_watermelon",
             "df_strawberry")

for (i in name_cols) 
{  
    df_fruits[i] <- my_fruits_data %>% 
           filter (.data[[name_cols[i]]] ==1) 
    ......
}

Thanks chase77

2
  • 4
    It helps to have usable data for questions, making it a complete "minimal working example"; please include sample data (reprex) that we can use, preferably with dput(x); see stackoverflow.com/q/5963269, minimal reproducible example, and stackoverflow.com/tags/r/info. Ultimately, I feel a for loop is unlikely to be the preferred method for this, can you show what you're intending to have at the end of all of this processing? It's likely R has a more-efficient way to approach what you need. Commented Dec 20, 2021 at 6:24
  • 3
    This is simply data splitting/ data grouping. You do not need to use for-loops. Give an example of your data and the expected output. Also what do you mean as further processing? IF you are going to do almost similar post process for each fruit dataset, You should rather group the whole dataset than having it in different fruit datasets. Commented Dec 20, 2021 at 6:29

1 Answer 1

1

I understood that you want to split your data based on the type of fruit, which is provided by separate index columns. Here is how to do that with an example dataset.

library(tidyverse)
my_fruits_data = tribble(
  ~ index_apple, ~ index_pear, ~index_banana, ~ x1,
  1, 0, 0, 10,
  1, 0, 0, 11,
  0, 1, 0, 12,
  0, 0, 1, 13,
  0, 0, 1, 14, 
  0, 0, 1, 15
)

The example data:

> my_fruits_data
# A tibble: 6 x 4
  index_apple index_pear index_banana    x1
        <dbl>      <dbl>        <dbl> <dbl>
1           1          0            0    10
2           1          0            0    11
3           0          1            0    12
4           0          0            1    13
5           0          0            1    14
6           0          0            1    15

First you can transform the data to have a single fruit column that mentions the type of fruit:

fruit_data = my_fruits_data %>% 
  pivot_longer(
    cols = starts_with("index_"), 
    names_prefix = "index_", 
    names_to = "fruit",
    values_to = "fruit_ind"
  ) %>% 
  filter(fruit_ind == 1) %>% 
  select(-fruit_ind)

The result:

> fruit_data
# A tibble: 6 x 2
     x1 fruit 
  <dbl> <chr> 
1    10 apple 
2    11 apple 
3    12 pear  
4    13 banana
5    14 banana
6    15 banana

Finally, as @Onyambu mentioned, you could consider grouping this data by our new variable fruit. If you wanted to do different processing for different fruits, you could split() the data to get a list of separate data frames for each fruit:

> split(fruit_data, fruit_data$fruit)
$apple
# A tibble: 2 x 2
     x1 fruit
  <dbl> <chr>
1    10 apple
2    11 apple

$banana
# A tibble: 3 x 2
     x1 fruit 
  <dbl> <chr> 
1    13 banana
2    14 banana
3    15 banana

$pear
# A tibble: 1 x 2
     x1 fruit
  <dbl> <chr>
1    12 pear 
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you so much Kybazzi for the detailed demo to get around the problem and also to Onyambu and r2evens for the ideas. I'll try - it should work. But this problem prompted me to search for a way to turn a string into a data frame name and only got an idea of using function assign():
Thank you so much Kybazzi for the detailed demo to get around the problem and also to Onyambu and r2evens for the ideas. I'll try - it should work. But this problem prompted me to search for a way to turn a string into a data frame name and only got an idea of using function assign(): assign(string, df_apple %>% filter(.data[[Index_fruits[1]]] ==1)). But this method doesn't work conveniently for my case. Would like to have some generic ideas for assigning a string to data frame name.
I don't think it's a recommended approach to try using assign() in this way - why do you want to do that instead of something similar to the solution I've showed here?
Because there are following analysis e.g. using summarise(). I don't want to copy the same set of codes multiple times for different fruits (over 50 types in my actual case). That's why I try to use a loop.
In my code, you can summarize results on fruit_data, such as fruit_data %>% group_by(fruit) %>% summarise(x = mean(x1)). I still don't understand why you want to create a large number of variables using assign().
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.