2

I was wondering if there was a way to create multiple columns from a list in R using the mutate() function within a for loop.

Here is an example of what I mean:

The Problem:

I have a data frame df that has 2 columns: category and rating. I want to add a column for every element of df$category and in that column, I want a 1 if the category column matches the iterator.

library(dplyr)

df <- tibble(
  category = c("Art","Technology","Finance"),
  rating = c(100,95,50)
)

Doing it manually, I could do:

df <-
  df %>% 
  mutate(art = ifelse(category == "Art", 1,0))

However, what happens when I have 50 categories? (Which is close to what I have in my original problem. That would take a lot of time!)

What I tried:

category_names <- df$category

for(name in category_names){

  df <-
    df %>% 
    mutate(name = ifelse(category == name, 1,0))

}

Unfortunately, It doesn't seem to work.

I'd appreciate any light on the subject!

Full Code:

library(dplyr)

#Creates tibble
df <- tibble(
  category = c("Art","Technology","Finance"),
  rating = c(100,95,50)
)

#Showcases the operation I would like to loop over df
df <-
  df %>% 
  mutate(art = ifelse(category == "Art", 1,0))

#Creates a variable for clarity
category_names <- df$category

#For loop I tried
for(name in category_names){

  df <-
    df %>% 
    mutate(name = ifelse(category == name, 1,0))

}

I am aware that what I am essentially doing is a form of model.matrix(); however, before I found out about that function I was still perplexed why what I was doing before wasn't working.

0

2 Answers 2

4

We can use pivot_wider after creating a sequence column

library(dplyr)
library(tidyr)
df %>% 
    mutate(rn = row_number(), n = 1) %>% 
    pivot_wider(names_from = category, values_from = n, 
             values_fill = list(n = 0)) %>%
    select(-rn)
# A tibble: 3 x 4
#  rating   Art Technology Finance
#   <dbl> <dbl>      <dbl>   <dbl>
#1    100     1          0       0
#2     95     0          1       0
#3     50     0          0       1

Or another option is map

library(purrr)
map_dfc(unique(df$category),  ~  df %>%
                                 transmute(!! .x := +(category == .x))) %>% 
     bind_cols(df, .)
# A tibble: 3 x 5
#  category   rating   Art Technology Finance
#* <chr>       <dbl> <int>      <int>   <int>
#1 Art           100     1          0       0
#2 Technology     95     0          1       0
#3 Finance        50     0          0       1

If we need a for loop

for(name in category_names) df <- df %>% mutate(!! name := +(category == name))

Or in base R with table

cbind(df, as.data.frame.matrix(table(seq_len(nrow(df)), df$category)))
#    category rating Art Finance Technology
#1        Art    100   1       0          0
#2 Technology     95   0       0          1
#3    Finance     50   0       1          0
Sign up to request clarification or add additional context in comments.

6 Comments

My first thought was to use pivot_wider() but was unable to really wrap my head around it; however, with the original question being using mutate and for-loops this seems like a workaround. I'm curious to see an answer within the realm of the question. Thank you for your speedy response!
I edited my previous comment to reflect the reasoning. I was so excited about a response and saw pivot_wider and hastily made the decision to accept it. My apologies for that.
@spookywagons not clear where it is not working. I didn't see any new data on your post
@spookywagons let me know if you have an issue with using := instead of ==
@spookywagons it's okay. Usually people wanted to avoid for loop when using tidyverse. So, I was thinking about answering the other options instead of for loop
|
0

Wanted to throw something in for anyone who stumbles across this question. The problem in the OP is that the "name" column name gets re-used during each iteration of the loop: you end up with only one new column, when you really wanted three (or 50). I consistently find myself wanting to create multiple new columns within loops, and I recently found out that mutate can now take "glue"-like inputs to do this. The following code now also solves the original question:

for(name in category_names){
  df <-
    df %>%
    mutate("{name}" := ifelse(category == name, 1, 0))
}

This is equivalent to akrun's answer using a for loop, but it doesn't involve the !! operator. Note that you still need the "walrus" := operator, and that the column name needs to be a string (I think since it's using "glue" in the background). I'm thinking some people might find this format easier to understand.

Reference: https://www.tidyverse.org/blog/2020/02/glue-strings-and-tidy-eval/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.