2

Hoping to create the new variable X based on three existing variables: "SubID" "Day" and "Time". I used to have three sorting functions in excel to do this manually: first sort by the "SubID," and then sort by the "Day," and lastly sort by "Time." X should be from 1 to the largest number of rows for each SubID, based on the order of Day and Time.

SubID: assigned subject number

Day: each subject's day number (1,2,3...21)

Time: 1, 2, 3

X: the number of rows marked as the same SubID

SubID Day  Time   X    
 1    1     1     1
 1    1     2     2
 1    1     3     3
 1    2     1     4
 1    2     2     5
 2    1     1     1
 2    1     2     2
 2    1     3     3
 2    2     3     6
 2    2     2     5
 2    2     1     4

I have been doing this manually in excel and I am sure there must be a smarter way to do it in R, but I am new to R and don't know how. Thank you in advance!

2 Answers 2

2

May be this helps

library(dplyr)
df1 %>% 
  group_by(SubID) %>% 
  mutate(X1 = row_number(as.numeric(paste0(Day, Time))))
# A tibble: 11 x 5
# Groups:   SubID [2]
#   SubID   Day  Time     X    X1
#   <int> <int> <int> <int> <int>
# 1     1     1     1     1     1
# 2     1     1     2     2     2
# 3     1     1     3     3     3
# 4     1     2     1     4     4
# 5     1     2     2     5     5
# 6     2     1     1     1     1
# 7     2     1     2     2     2
# 8     2     1     3     3     3
# 9     2     2     3     6     6
#10     2     2     2     5     5
#11     2     2     1     4     4

Or using order

df1 %>% 
  group_by(SubID) %>% 
  mutate(X1 = order(Day, Time))

Or with data.table

library(data.table)
setDT(df1)[, X1 := order(Day, Time), by = SubID]

data

df1 <- structure(list(SubID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L), Day = c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), 
Time = c(1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 3L, 2L, 1L), X = c(1L, 
2L, 3L, 4L, 5L, 1L, 2L, 3L, 6L, 5L, 4L)), class = "data.frame", 
 row.names = c(NA, 
   -11L))
Sign up to request clarification or add additional context in comments.

2 Comments

The codes ran, but when I went to check the new variable X1, R said it is "null." Meaning I haven't recreated the variable yet? Sorry for this stupid question, I am new to R. The codes I used are: data2 %>% group_by(SubID) %>% mutate(X1 = order(Day, Time)) data2$X1
@Susan You need to assign it to data i.e. data2 <- data2 %>% group_by(SubID) %>% mutate(X1 = order(Day, Time))
1

May be with data.table package. You will have to install it in case you haven't already. I have commented the command.

# install.packages("data.table")
library(data.table)

we can generate your data in the following way.

df <- data.frame(SubId=sample(1:2,10,replace=TRUE),
                 Day=sample(1:2,10,replace=TRUE),
                     Time=sample(1:2,10,replace=TRUE))

Then convert the data.frame into data.table.

setDT(df)
##> df
##     SubId Day Time
##  1:     1   2    1
##  2:     1   1    1
##  3:     1   1    2
##  4:     2   2    1
##  5:     2   1    1
##  6:     1   2    2
##  7:     1   2    1
##  8:     1   2    2
##  9:     2   1    1
## 10:     2   1    2

Finally we can order my SubId, Day ,Time. As the table is ordered as we wanted, we just have to number the rows from 1 to the number of observations in each SubId.

df[order(SubId,Day,Time),X:=1:.N,SubId]


##> df
##    SubId Day Time X
## 1:     1   2    1 3
## 2:     1   1    1 1
## 3:     1   1    2 2
## 4:     2   2    1 4
## 5:     2   1    1 1
## 6:     1   2    2 5
## 7:     1   2    1 4
## 8:     1   2    2 6
## 9:     2   1    1 2
## 10:    2   1    2 3

1 Comment

Thank you so much for the detailed explanations! They are very helpful!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.