3

If anyone mind lending some knowledge... What I am trying to do is make a new dataframe based on the below data frame values.

id   value
ant    10
cat    4
cat    6
dog    5
dog    3
dog    2
fly    9

What I want to do next is, in sequential order I want to make a dataframe that looks like the following.

  • Every time we see a new id, we create a column. The max value is 10 so there should be 10 rows.
  • Our first word is ant and so therefore for every row of ant, I would like a 0.
  • Our next column is cat. We have 2 values and what I would like to do is for the first value we see, the first 4 rows must be 0 which is followed by 6 rows of 1.
  • Same logic for dog, with first five rows as 0 and next three rows as 1 and last 2 as 0.
  • Fly has only 9 rows of 0 and the last row should contain NA.

It should look like this

ant  cat  dog  fly
0    0    0    0
0    0    0    0
0    0    0    0
0    0    0    0
0    1    0    0
0    1    1    0
0    1    1    0
0    1    1    0
0    1    0    0
0    1    0    NA

I know how to do this the long way by

newdf <- data.frame(matrix(2, ncol = length(unique(df[,"id"])) , nrow = 10))
newdf$X1[1:10] <- 0
newdf$X2[1:4] <- 0
newdf$X2[5:10] <- 1
...

However, is there any way to do this more efficiently? Note that my actual data will have roughly 50 rows so that's why I am looking for a more efficient way to complete this!

2 Answers 2

1

Here's a tidyverse answer -

library(dplyr)
library(tidyr)

df %>%
  group_by(id) %>%
  mutate(val = rep(c(0, 1), length.out = n())) %>%
  uncount(value) %>%
  mutate(row = row_number()) %>%
  complete(row = 1:10) %>%
  pivot_wider(names_from = id, values_from = val) %>%
  select(-row)

#     ant   cat   dog   fly
#   <dbl> <dbl> <dbl> <dbl>
# 1     0     0     0     0
# 2     0     0     0     0
# 3     0     0     0     0
# 4     0     0     0     0
# 5     0     1     0     0
# 6     0     1     1     0
# 7     0     1     1     0
# 8     0     1     1     0
# 9     0     1     0     0
#10     0     1     0    NA

For each id we assign an alternate 0, 1 value and use uncount to repeat the rows based on the count. Get the data in wide format so that we have a separate column for each id.

data

df <- structure(list(id = c("ant", "cat", "cat", "dog", "dog", "dog", 
"fly"), value = c(10, 4, 6, 5, 3, 2, 9)), row.names = c(NA, -7L
), class = "data.frame")
Sign up to request clarification or add additional context in comments.

5 Comments

Hi, this unfortunately does not produce the same answer. I somehow get a lot more rows than expected, many of which are filled with NA. but thank you for answering!
@anonymous I have updated the answer for your updated dataframe.
I would have done it exactly this way with a minor exception in complete step, wherein I would have liked to put complete(row = seq_len(max(df$value))) instead of hardcoding the sequence to 1:10 . Upvoted already. P.S. just checked that deleting that step will have no impact in output too
Hi thankyou! I am curious, what happens if I have an error with bad names. It suggests I use name_repair but I have tried some ways with it and its not producing an answer. Sorry I'm all new to R.
@anonymous I am not sure what might be causing that. Probably you already have the column which is an id name or you have a name which is not allowed as column name.
1

You can try the following base R code

maxlen <- with(df, max(tapply(value, id, sum)))
list2DF(
  lapply(
    with(df, split(value, id)),
    function(x) {
      `length<-`(
        rep(rep(c(0, 1), length.out = length(x)), x),
        maxlen
      )
    }
  )
)

which gives

   ant cat dog fly
1    0   0   0   0
2    0   0   0   0
3    0   0   0   0
4    0   0   0   0
5    0   1   0   0
6    0   1   1   0
7    0   1   1   0
8    0   1   1   0
9    0   1   0   0
10   0   1   0  NA

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.