Can't seem to wrap my head around a seemingly simple task: how to filter a dataframe based on a pattern in one column, which, however, is to match only if a pattern in another column matches:
Data:
df <- data.frame(
Speaker = c("A", NA, "B", "C", "A", "B", "A", "B", "C"),
Utterance = c("uh-huh",
"(0.666)",
"WOW!",
"#yeah#",
"=right=",
"oka::y¿",
"okay",
"some stuff",
"!more! £TAlk£"),
Orthographic = c("uh-huh", "NA", "wow", "yeah", "right", "okay", "okay", "some stuff", "more talk")
)
I want to remove rows in df where the pattern ^(yeah|okay|right|mhm|mm|uh(-| )?huh)$ matches in column Orthographic but not if these rows contain any character from character class [A-Z:↑↓£#¿?!] in column Utterance.
Expected outcome:
df
Speaker Utterance Orthographic
3 B WOW! wow
4 C #yeah# yeah
6 B oka::y¿ okay
8 B some stuff some stuff
9 C !more! £TAlk£ more talk
Attempts so far: (filters too much!)
library(dplyr)
df %>%
filter(!is.na(Speaker)) %>%
filter(!grepl("^(yeah|okay|right|mhm|mm|uh(-| )?huh)$", Orthographic)
& grepl("[A-Z:↑↓£#¿?!]", Utterance))
Speaker Utterance Orthographic
1 B WOW! wow
2 C !more! £TAlk£ more talk