0

I currently have a data frame of this structure

    ID-No  cigsaday   activity  
    1      NA        1           
    2      NA        1          
    1       5       NA          
    2       5       NA          

I want to concatenate the rows with the identical ID numbers and create a new data frame that is supposed to look like this

ID-No  cigsaday   activity  
    1      5        1           
    2      5        1

The data frame includes characters as well as numerical, in this way we would match based on a participant ID which occurs 4 times in the dataset within the first column.

Any help is appreciated!

3 Answers 3

1

A data.table option

> setDT(df)[, lapply(.SD, na.omit), ID_No]
   ID_No cigsaday activity
1:     1        5        1
2:     2        5        1

Data

> dput(df)
structure(list(ID_No = c(1L, 2L, 1L, 2L), cigsaday = c(NA, NA,
5L, 5L), activity = c(1L, 1L, NA, NA)), class = "data.frame", row.names = c(NA,
-4L))
Sign up to request clarification or add additional context in comments.

Comments

1

Many ways lead to Rome. For the sake of completeness, here are some other approaches which return the expected result for the given sample dataset. Your mileage may vary.

1. dplyr, na.omit()

library(dplyr)
df %>% 
  group_by(ID_No) %>% 
  summarise(across(everything(), na.omit))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 3
  ID_No cigsaday activity
  <int>    <int>    <int>
1     1        5        1
2     2        5        1

Note, this a dplyr version of ThomasIsCoding's answer.

2. dplyr, reduce(), coalesce()

library(dplyr)
df %>% 
  group_by(ID_No) %>% 
  summarise(across(everything(), ~ purrr::reduce(.x, coalesce)))

3. data.table, fcoalesce()

library(data.table)
setDT(df)[, lapply(.SD, function(x) fcoalesce(as.list(x))), ID_No]
   ID_No cigsaday activity
1:     1        5        1
2:     2        5        1

4. data.table, Reduce(), fcoalesce()

library(data.table)
setDT(df)[, lapply(.SD, Reduce, f = fcoalesce), ID_No]

Comments

0

A possible solution using na.locf() which replaces a value with the most recent non-NA value.

library(zoo)

dat %>% 
  group_by(IDNo) %>% 
  mutate_at(vars(-group_cols()),.funs=function(x) na.locf(x)) %>% 
  distinct(IDNo,cigsaday,activity,.keep_all = TRUE) %>% 
  ungroup()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.