2

I have a dataframe called data. I am trying to create a new column with 0 and 1s for every CRD_NUM and BIZ_DT. I will check for the maximum Jrny_Ind and assign 1 to the new column where max value is found in that row.

Here's the data to work with:

  structure(list(JRNY_ID_NUM = c(115485143065, 115581455926, 115542253339, 
    115568253504, 115579064996, 115557373723), CRD_NUM = c(1000148004095169, 
    1000148004095169, 1000148004095169, 1000148004095169, 1000148004095169, 
    1000148004095169), BIZ_DT = structure(c(3L, 3L, 4L, 4L, 5L, 5L
    ), .Label = c("01-Jan-17", "02-Jan-17", "03-Jan-17", "04-Jan-17", 
    "05-Jan-17", "06-Jan-17", "07-Jan-17", "08-Jan-17", "09-Jan-17", 
    "10-Jan-17", "11-Jan-17", "12-Jan-17", "13-Jan-17", "14-Jan-17", 
    "15-Jan-17", "16-Jan-17", "17-Jan-17", "18-Jan-17", "19-Jan-17", 
    "20-Jan-17", "21-Jan-17", "22-Jan-17", "23-Jan-17", "24-Jan-17", 
    "25-Jan-17", "26-Jan-17", "27-Jan-17", "28-Jan-17", "29-Jan-17", 
    "30-Jan-17", "31-Jan-17"), class = "factor"), Jrny_Ind = c(1L, 
    2L, 1L, 2L, 1L, 2L)), .Names = c("JRNY_ID_NUM", "CRD_NUM", "BIZ_DT", 
    "Jrny_Ind"), class = c("data.table", "data.frame"), row.names = c(NA, 
    -6L), .internal.selfref = <pointer: 0x0000000002640788>)

Desired Output:

    JRNY_ID_NUM          CRD_NUM    BIZ_DT Jrny_Ind Last_Trip
1: 115485143065 1000148004095169 03-Jan-17        1    0
2: 115581455926 1000148004095169 03-Jan-17        2    1
3: 115542253339 1000148004095169 04-Jan-17        1    0
4: 115568253504 1000148004095169 04-Jan-17        2    1
5: 115579064996 1000148004095169 05-Jan-17        1    0
6: 115557373723 1000148004095169 05-Jan-17        2    1

I have tried to get the "max rows" for each card and date like below:

data[, .SD[which.max(Jrny_Ind)], by = c("CRD_NUM","BIZ_DT")]

Not sure how to assign a new column using data.table.

2
  • 1
    data[, last_trip := +(Jrny_Ind == max(Jrny_Ind)), by = .(CRD_NUM, BIZ_DT)] (or as.integer instead of the +) Commented Aug 29, 2017 at 13:29
  • 2
    library(dplyr); dat %>% group_by(CRD_NUM, BIZ_DT) %>% mutate(Last_Trip = as.integer(Jrny_Ind == max(Jrny_Ind))) Commented Aug 29, 2017 at 13:36

2 Answers 2

2

There should be a duplicate for this. But for now:

data[, last_trip := as.integer(Jrny_Ind == max(Jrny_Ind)), by = .(CRD_NUM, BIZ_DT)]
Sign up to request clarification or add additional context in comments.

Comments

2

Using dplyr:

library(dplyr)
dat %>% group_by(CRD_NUM, BIZ_DT) %>% 
        mutate(Last_Trip = as.integer(Jrny_Ind == max(Jrny_Ind)))

Or plyr:

library(plyr)
ddply(dat,.(CRD_NUM, BIZ_DT),transform,Last_Trip =  as.numeric(Jrny_Ind == max(Jrny_Ind)))

Output:

dat
## # A tibble: 6 x 6
## # Groups:   CRD_NUM, BIZ_DT [3]
##    JRNY_ID_NUM      CRD_NUM    BIZ_DT Jrny_Ind last_trip Last_Trip
##          <dbl>        <dbl>    <fctr>    <int>     <int>     <int>
## 1 115485143065 1.000148e+15 03-Jan-17        1         0         0
## 2 115581455926 1.000148e+15 03-Jan-17        2         1         1
## 3 115542253339 1.000148e+15 04-Jan-17        1         0         0
## 4 115568253504 1.000148e+15 04-Jan-17        2         1         1
## 5 115579064996 1.000148e+15 05-Jan-17        1         0         0
## 6 115557373723 1.000148e+15 05-Jan-17        2         1         1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.