dat1 <- data.frame(id1 = c(1, 1, 2),
pattern = c("apple", "applejack", "bananas, sweet"))
dat2 <- data.frame(id2 = c(1174, 1231),
description = c("apple is sweet", "bananass are not"),
description2 = c("melon", "bananas, sweet yes"))
> dat1
id1 pattern
1 1 apple
2 1 applejack
3 2 bananas, sweet
> dat2
id2 description description2
1 1174 apple is sweet melon
2 1231 bananass are not bananas, sweet yes
I have two data.frames, dat1 and dat2. I would like to take each pattern in dat1 and search for them in dat2's description and description2 using the regular expression, \\b[pattern]\\b.
Here is my attempt and the desired final output:
description_match <- description2_match <- vector()
for(i in 1:nrow(dat1)){
for(j in 1:nrow(dat2)){
search_pattern <- paste0("\\b", dat1$pattern[i], "\\b")
description_match <- c(description_match, ifelse(grepl(search_pattern, dat2[j, "description"]), 1, 0))
description2_match <- c(description2_match, ifelse(grepl(search_pattern, dat2[j, "description2"]), 1, 0))
}
}
final_output <- data.frame(id1 = rep(dat1$id1, each = nrow(dat2)),
pattern = rep(dat1$pattern, each = nrow(dat2)),
id2 = rep(dat2$id2, length = nrow(dat1) * nrow(dat2)),
description_match = description_match,
description2_match = description2_match)
> final_output
id1 pattern id2 description_match description2_match
1 1 apple 1174 1 0
2 1 apple 1231 0 0
3 1 applejack 1174 0 0
4 1 applejack 1231 0 0
5 2 bananas, sweet 1174 0 0
6 2 bananas, sweet 1231 0 1
This approach is slow and not efficient if dat1 and dat2 have many rows. What's a quicker way to do this so that I can avoid a for loop?