0

Could someone please help me insert the group_by function (or any function that would do the trick)into this:

df$buy <-ifelse(
 ( 
 (df$month %in% c(201909, 201910,  201911, 201912, 202001, 202002)) &
  (df$activity== 0)                                  
  ),
  1,  
  0   
 )

My data has three columns: ID, Month, and X. Each id has 12-14 observations. I want to group by the id then assign a 1 to all the IDs in a group where month = 201009, 201919, 201911, 201912, 202001, 202002 and X in those rows = 0. For example, buy is the new variable I want to create:

 ID  MONTH  X Buy
 1   201901 1 1
 1   201002 0 1
 1   201903 1 1
 1   201904 0 1
 1   201905 1 1
 1   201906 1 1
 1   201907 0 1
 1   201908 0 1
 1   201909 0 1
 1   201910 0 1
 1   201911 0 1
 1   201912 0 1
 1   202001 0 1
 1   202002 0 1
 2   201901 1 0
 2   201902 1 0
 2   201903 0 0
 2   201904 0 0
 2   201905 0 0 
 2   201906 1 0 
 2   201907 0 0 
 2   201908 0 0 
 2   201909 0 0
 2   201910 1 0 
 2   201911 0 0 
 2   201912 1 0 
 2   202001 0 0 
 2   202002 0 0 

I want a 1 for the entire group based on ID only if the months stated above have a zero in the X column, and a zero if it does not meet the requirements. Thank you in advance.

3
  • 1
    Your code should work without need of group by! Commented Dec 6, 2020 at 0:49
  • 1
    If Duck is not right, then it is likely because we don't understand enough about the input data and your expected output. Please make this question a little more reproducible by adding sample unambiguous data (e.g., data.frame(x=...,y=...) or the output from dput(head(x))) and intended output given that input. Refs: stackoverflow.com/q/5963269, minimal reproducible example, and stackoverflow.com/tags/r/info. Commented Dec 6, 2020 at 1:38
  • @r2evans you are correct, I was not clear in my request. I have edited the post to include some of my data and my desired outcome. Thank you both. Commented Dec 6, 2020 at 4:06

2 Answers 2

1

I think this will serve your purpose

library(tidyverse)

df %>% filter(MONTH %in% c(201909, 201910,  201911, 201912, 202001, 202002)) %>%
  group_by(ID) %>% summarise(Buy = sum(X+1)) %>%
  mutate(Buy = ifelse(Buy == 6, 1, 0)) %>% right_join(df) %>%
  select(1, 3:4, 2)

   ID  MONTH X Buy
1   1 201901 1   1
2   1 201902 0   1
3   1 201903 1   1
4   1 201904 0   1
5   1 201905 1   1
6   1 201906 1   1
7   1 201907 0   1
8   1 201908 0   1
9   1 201909 0   1
10  1 201910 0   1
11  1 201911 0   1
12  1 201912 0   1
13  1 202001 0   1
14  1 202002 0   1
15  2 201901 1   0
16  2 201902 1   0
17  2 201903 0   0
18  2 201904 0   0
19  2 201905 0   0
20  2 201906 1   0
21  2 201907 0   0
22  2 201908 0   0
23  2 201909 0   0
24  2 201910 1   0
25  2 201911 0   0
26  2 201912 1   0
27  2 202001 0   0
28  2 202002 0   0

The code may have to be changed in case X has negative values.

Sign up to request clarification or add additional context in comments.

Comments

1

Since the TO didn't specify a proper exampel I created one of my one.

Generally, I'm wondering if this all could be done more easily, but it occured to me that this is probably one of the rare cases where it could make sense to first pivot_wider to calculate new variables. So this is what I'm doing:

  • Pivoting wider so that each month is its own column and each ID is a row.
  • Do a rowwise calculation on some months columns and check if at least one of them is 1.
  • Reshaping back to long format.

Note:

  • Since I created my own data frame I took some short cuts, so the values do not correspond to the TOs example.
  • I also just used some example columns for the row calculation, again not matching the columns the TO had provided.
  • However, this all can be adjusted easily in the code below:

Creating data

set.seed(1)
df <- data.frame(ID = c(rep(1:2, each = 12)),
                 MONTH = paste0(c(rep(2019, 12)), rep(1:12, 2)),
                 X = round(runif(24, 0,1),0))

Doing the calculations

library(tidyverse)
df %>%
  pivot_wider(id_cols      = ID,
              names_from   = MONTH,
              names_prefix = "MONTH_",
              values_from  = X) %>%
  mutate(Buy = apply(across(c(MONTH_20191, MONTH_20195, MONTH_201912)), 1, function(x) (any(x == 1))),
         Buy = as.numeric(Buy)) %>%
  pivot_longer(cols = starts_with("MONTH"),
               names_prefix = "MONTH_",
               names_to = "MONTH",
               values_to = "X") %>% print(n=Inf)

Result

# A tibble: 24 x 4
      ID   Buy MONTH      X
   <int> <dbl> <chr>  <dbl>
 1     1     0 20191      0
 2     1     0 20192      0
 3     1     0 20193      1
 4     1     0 20194      1
 5     1     0 20195      0
 6     1     0 20196      1
 7     1     0 20197      1
 8     1     0 20198      1
 9     1     0 20199      1
10     1     0 201910     0
11     1     0 201911     0
12     1     0 201912     0
13     2     1 20191      1
14     2     1 20192      0
15     2     1 20193      1
16     2     1 20194      0
17     2     1 20195      1
18     2     1 20196      1
19     2     1 20197      0
20     2     1 20198      1
21     2     1 20199      1
22     2     1 201910     0
23     2     1 201911     1
24     2     1 201912     0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.