Change multiple values in a dataframe based on two other values

Question

If anyone mind lending some knowledge... What I am trying to do is make a new dataframe based on the below data frame values.

id   value
ant    10
cat    4
cat    6
dog    5
dog    3
dog    2
fly    9

What I want to do next is, in sequential order I want to make a dataframe that looks like the following.

Every time we see a new id, we create a column. The max value is 10 so there should be 10 rows.
Our first word is ant and so therefore for every row of ant, I would like a 0.
Our next column is cat. We have 2 values and what I would like to do is for the first value we see, the first 4 rows must be 0 which is followed by 6 rows of 1.
Same logic for dog, with first five rows as 0 and next three rows as 1 and last 2 as 0.
Fly has only 9 rows of 0 and the last row should contain NA.

It should look like this

ant  cat  dog  fly
0    0    0    0
0    0    0    0
0    0    0    0
0    0    0    0
0    1    0    0
0    1    1    0
0    1    1    0
0    1    1    0
0    1    0    0
0    1    0    NA

I know how to do this the long way by

newdf <- data.frame(matrix(2, ncol = length(unique(df[,"id"])) , nrow = 10))
newdf$X1[1:10] <- 0
newdf$X2[1:4] <- 0
newdf$X2[5:10] <- 1
...

However, is there any way to do this more efficiently? Note that my actual data will have roughly 50 rows so that's why I am looking for a more efficient way to complete this!

Ronak Shah · Accepted Answer · 2021-05-27 09:13:51Z

1

Here's a tidyverse answer -

library(dplyr)
library(tidyr)

df %>%
  group_by(id) %>%
  mutate(val = rep(c(0, 1), length.out = n())) %>%
  uncount(value) %>%
  mutate(row = row_number()) %>%
  complete(row = 1:10) %>%
  pivot_wider(names_from = id, values_from = val) %>%
  select(-row)

#     ant   cat   dog   fly
#   <dbl> <dbl> <dbl> <dbl>
# 1     0     0     0     0
# 2     0     0     0     0
# 3     0     0     0     0
# 4     0     0     0     0
# 5     0     1     0     0
# 6     0     1     1     0
# 7     0     1     1     0
# 8     0     1     1     0
# 9     0     1     0     0
#10     0     1     0    NA

For each id we assign an alternate 0, 1 value and use uncount to repeat the rows based on the count. Get the data in wide format so that we have a separate column for each id.

data

df <- structure(list(id = c("ant", "cat", "cat", "dog", "dog", "dog", 
"fly"), value = c(10, 4, 6, 5, 3, 2, 9)), row.names = c(NA, -7L
), class = "data.frame")

edited May 27, 2021 at 9:13

answered May 27, 2021 at 6:21

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

anonymous Over a year ago

Hi, this unfortunately does not produce the same answer. I somehow get a lot more rows than expected, many of which are filled with NA. but thank you for answering!

Ronak Shah Over a year ago

@anonymous I have updated the answer for your updated dataframe.

AnilGoyal Over a year ago

I would have done it exactly this way with a minor exception in complete step, wherein I would have liked to put complete(row = seq_len(max(df$value))) instead of hardcoding the sequence to 1:10 . Upvoted already. P.S. just checked that deleting that step will have no impact in output too

anonymous Over a year ago

Hi thankyou! I am curious, what happens if I have an error with bad names. It suggests I use name_repair but I have tried some ways with it and its not producing an answer. Sorry I'm all new to R.

Ronak Shah Over a year ago

@anonymous I am not sure what might be causing that. Probably you already have the column which is an id name or you have a name which is not allowed as column name.

ThomasIsCoding · Accepted Answer · 2021-05-27 06:57:14Z

1

You can try the following base R code

maxlen <- with(df, max(tapply(value, id, sum)))
list2DF(
  lapply(
    with(df, split(value, id)),
    function(x) {
      `length<-`(
        rep(rep(c(0, 1), length.out = length(x)), x),
        maxlen
      )
    }
  )
)

which gives

   ant cat dog fly
1    0   0   0   0
2    0   0   0   0
3    0   0   0   0
4    0   0   0   0
5    0   1   0   0
6    0   1   1   0
7    0   1   1   0
8    0   1   1   0
9    0   1   0   0
10   0   1   0  NA

answered May 27, 2021 at 6:57

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

Collectives™ on Stack Overflow

Change multiple values in a dataframe based on two other values

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related