0

Can't seem to wrap my head around a seemingly simple task: how to filter a dataframe based on a pattern in one column, which, however, is to match only if a pattern in another column matches:

Data:

df <- data.frame(
  Speaker = c("A", NA, "B", "C", "A", "B", "A", "B", "C"),
  Utterance = c("uh-huh",                       
                "(0.666)",                     
                "WOW!",                        
                "#yeah#",               
                "=right=",             
                "oka::y¿",               
                "okay",                   
                "some stuff",             
                "!more! £TAlk£"),        
  Orthographic = c("uh-huh", "NA", "wow", "yeah", "right", "okay", "okay", "some stuff", "more talk")
)

I want to remove rows in df where the pattern ^(yeah|okay|right|mhm|mm|uh(-| )?huh)$ matches in column Orthographic but not if these rows contain any character from character class [A-Z:↑↓£#¿?!] in column Utterance.

Expected outcome:

df
  Speaker     Utterance Orthographic
3       B          WOW!          wow
4       C        #yeah#         yeah
6       B       oka::y¿         okay
8       B    some stuff   some stuff
9       C !more! £TAlk£    more talk

Attempts so far: (filters too much!)

library(dplyr)
df %>%
  filter(!is.na(Speaker)) %>%      
  filter(!grepl("^(yeah|okay|right|mhm|mm|uh(-| )?huh)$", Orthographic) 
         & grepl("[A-Z:↑↓£#¿?!]", Utterance))
  Speaker     Utterance Orthographic
1       B          WOW!          wow
2       C !more! £TAlk£    more talk

1 Answer 1

2

I think you need | :

library(dplyr)

df %>%
  filter(!is.na(Speaker)) %>%
  filter(!grepl("^(yeah|okay|right|mhm|mm|uh(-| )?huh)$", Orthographic) 
         | grepl("[A-Z:↑↓£#¿?!]", Utterance))

#  Speaker     Utterance Orthographic
#1       B          WOW!          wow
#2       C        #yeah#         yeah
#3       B       oka::y¿         okay
#4       B    some stuff   some stuff
#5       C !more! £TAlk£    more talk

Keep rows that does not have ^(yeah|okay|right|mhm|mm|uh(-| )?huh)$ Or have [A-Z:↑↓£#¿?!].

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, of course, that does it! I could have sworn I've tried that too, but apparently I haven't ! ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.