Merge columns within dataframe based on column value R

Question

I currently have a data frame of this structure

    ID-No  cigsaday   activity  
    1      NA        1           
    2      NA        1          
    1       5       NA          
    2       5       NA

I want to concatenate the rows with the identical ID numbers and create a new data frame that is supposed to look like this

ID-No  cigsaday   activity  
    1      5        1           
    2      5        1

The data frame includes characters as well as numerical, in this way we would match based on a participant ID which occurs 4 times in the dataset within the first column.

Any help is appreciated!

ThomasIsCoding · Accepted Answer · 2021-03-09 23:08:52Z

1

A data.table option

> setDT(df)[, lapply(.SD, na.omit), ID_No]
   ID_No cigsaday activity
1:     1        5        1
2:     2        5        1

Data

> dput(df)
structure(list(ID_No = c(1L, 2L, 1L, 2L), cigsaday = c(NA, NA,
5L, 5L), activity = c(1L, 1L, NA, NA)), class = "data.frame", row.names = c(NA,
-4L))

answered Mar 9, 2021 at 23:08

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Uwe · Accepted Answer · 2021-03-12 07:28:21Z

Many ways lead to Rome. For the sake of completeness, here are some other approaches which return the expected result for the given sample dataset. Your mileage may vary.

1. dplyr, na.omit()

library(dplyr)
df %>% 
  group_by(ID_No) %>% 
  summarise(across(everything(), na.omit))

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 3
  ID_No cigsaday activity
  <int>    <int>    <int>
1     1        5        1
2     2        5        1

Note, this a dplyr version of ThomasIsCoding's answer.

2. dplyr, reduce(), coalesce()

library(dplyr)
df %>% 
  group_by(ID_No) %>% 
  summarise(across(everything(), ~ purrr::reduce(.x, coalesce)))

3. data.table, fcoalesce()

library(data.table)
setDT(df)[, lapply(.SD, function(x) fcoalesce(as.list(x))), ID_No]

   ID_No cigsaday activity
1:     1        5        1
2:     2        5        1

4. data.table, Reduce(), fcoalesce()

library(data.table)
setDT(df)[, lapply(.SD, Reduce, f = fcoalesce), ID_No]

jsv · Accepted Answer · 2021-03-09 22:47:27Z

0

A possible solution using na.locf() which replaces a value with the most recent non-NA value.

library(zoo)

dat %>% 
  group_by(IDNo) %>% 
  mutate_at(vars(-group_cols()),.funs=function(x) na.locf(x)) %>% 
  distinct(IDNo,cigsaday,activity,.keep_all = TRUE) %>% 
  ungroup()

answered Mar 9, 2021 at 22:47

jsv

7403 silver badges5 bronze badges

Collectives™ on Stack Overflow

Merge columns within dataframe based on column value R

3 Answers 3

Comments

1. dplyr, na.omit()

2. dplyr, reduce(), coalesce()

3. data.table, fcoalesce()

4. data.table, Reduce(), fcoalesce()

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1. dplyr, na.omit()

2. dplyr, reduce(), coalesce()

3. data.table, fcoalesce()

4. data.table, Reduce(), fcoalesce()

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related