2

I am interested in doing string detection over a set of columns. If that string (which in this case is ZSD) is found, I want to return the column number/name. If multiple matches are found, I want to return the last column name/number with that string.

Input

My input is this:

a.zsd                b.zsd   c.zsd   d.zsd
'ZSD'                'ZAD'   NA      'ZAD'
'ZAD'                NA      NA      'ZSD'
NA                   NA      'ZAD'   NA
'Not Achieved ZSD'   NA      'ZAD'   NA
'ZSD'                'ZSD'   NA      'ZSD'
NA                   NA      NA      NA

Output

My required output is a new column zsd.level:

a.zsd                b.zsd   c.zsd   d.zsd zsd.level
'ZSD'                'ZAD'   NA      'ZAD'    a
'ZAD'                NA      NA      'ZSD'    d
NA                   NA      'ZAD'   NA       NA
'Not Achieved ZSD'   NA      'ZAD'   NA       a
'ZSD'                'ZSD'   NA      'ZSD'    d
NA                   NA      NA      NA       NA

Info:

My data frame has over a hundred columns. I am interested in ONLY some of the columns having a name that ends at a string .zsd. These columns can either have NA or one of the following string values ZAD, ZSD, Not Achieved ZSD.

I am just interested in detecting the presence of the string ZSD. If not found in any of the columns, it should return NA in the output column (zsd.level). If the string is found in multiple columns, I want to return the last column name/number that contains the string.

My question is similar to this post but not exactly the same dplyr filter with condition on multiple columns

dput

dput(df)

structure(list(a.zsd = c("ZSD", "ZAD", NA, "Not Achieved ZSD", "ZSD", NA), 
               b.zsd = c("ZAD", NA, NA, NA, "ZSD", NA), 
               c.zsd = c(NA, NA, "ZAD", "ZAD", NA, NA), 
               d.zsd = c("ZAD", "ZSD", NA, NA, "ZSD", NA)), 
               class = "data.frame", row.names = c(NA, -6L))

Partial Solution

To select those columns with names ending in .zsd, I can do

library(stringr)
library(tidyverse)

df %>%
  select(ends_with(".zsd"))

To select or filter rows with the string ZSD, I can do

str_detect(., "ZSD"))

But how can I put multiple conditions together? Any help would be greatly appreciated.

3
  • What if ZSD is multiple columns like in row 5 Commented Oct 17, 2021 at 22:35
  • "If multiple matches are found, I want to return the last column with that string." Commented Oct 17, 2021 at 22:36
  • Sorry, it's a string. I should update and clarify that it's just in one column. Commented Oct 17, 2021 at 22:36

2 Answers 2

2

We could do it this way:

library(dplyr)
library(tidyr)
library(stringr)

df %>%  
  mutate(across(contains("zsd"), ~case_when(str_detect(., "ZSD") ~ cur_column()), .names = 'new_{col}')) %>%
  unite(zsd_level, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(zsd_level = str_remove_all(zsd_level, ".zsd"),
         zsd_level = str_sub(zsd_level, -1))

output:

 a.zsd b.zsd c.zsd d.zsd zsd_level
1              ZSD   ZAD  <NA>   ZAD         a
2              ZAD  <NA>  <NA>   ZSD         d
3             <NA>  <NA>   ZAD  <NA>          
4 Not Achieved ZSD  <NA>   ZAD  <NA>         a
5              ZSD   ZSD  <NA>   ZSD         d
6             <NA>  <NA>  <NA>  <NA>          
Sign up to request clarification or add additional context in comments.

3 Comments

See my update. Now it should be adequate!
That's great, it's mostly working. I can improve on including the NA value when the string is missing. I will now check on my original data. Thanks again!
Could you please have a look at the following post <stackoverflow.com/questions/69609589/…>
2

Another option, but a little bit more complicated than dear TarJae's:

library(dplyr)
library(tidyr)
library(stringr)

df %>% 
  mutate(rn = row_number()) %>% 
  pivot_longer(-rn) %>% 
  group_by(rn) %>% 
  filter(str_detect(value, "ZSD")) %>% 
  slice_tail() %>% 
  summarise(name = str_remove(name, ".zsd")) %>% 
  right_join(df %>% mutate(rn = row_number()), by = "rn") %>%
  arrange(rn) %>% 
  ungroup() %>% 
  select(ends_with("zsd"), zsd.level = name)

This returns

# A tibble: 6 x 5
  a.zsd            b.zsd c.zsd d.zsd zsd.level
  <chr>            <chr> <chr> <chr> <chr>    
1 ZSD              ZAD   NA    ZAD   a        
2 ZAD              NA    NA    ZSD   d        
3 NA               NA    ZAD   NA    NA       
4 Not Achieved ZSD NA    ZAD   NA    a        
5 ZSD              ZSD   NA    ZSD   d        
6 NA               NA    NA    NA    NA 

3 Comments

Nice solution. I like it.
@Martin Gal, Could you please have a look at the following post <stackoverflow.com/questions/69609589/…>
Thank you for the solution, however, it does not work on the original data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.