Three columns of my data.frame contain subjects. I want to subset this data.frame for different subjects. E.g. if I want to have a data.frame with the subject "apple", the row should be selected if the word "apple" appears in one of the three columns.
doc <- c("blabla1", "blabla2", "blabla3", "blabla4")
subj.1 <- c("apple", "prune", "coconut", "berry")
subj.2 <- c("coconut", "apple", "cherry", "banana and prune")
subj.3 <- c("berry", "banana", "apple and berry", "pear", "prune")
subjects <- c("apple", "prune", "coconut", "berry", "cherry", "pear", "banana")
mydf <- data.frame(doc, subj.1, subj.2, subj.3, stringsAsFactors=FALSE)
mydf
# doc subj.1 subj.2 subj.3
# 1 blabla1 apple coconut berry
# 2 blabla2 prune apple banana
# 3 blabla3 coconut cherry apple and berry
# 4 blabla4 berry banana and prune pear
the output for subject "apple" should look like this:
# doc subj.1 subj.2 subj.3
# 1 blabla1 apple coconut berry
# 2 blabla2 prune apple banana
# 3 blabla3 coconut cherry apple and berry
EDIT1: In addition, let's say i have about 200 different subjects and therefor I want 200 different data.frames. How could I do that?
I tried a loop approach:
mylist <- vector('list', length(subjects))
for(i in 1:length(subjects)) {
pattern <- subjects[i]
filter <- grepl(pattern, ignore.case=T, mydf$subj.1)
grepl(pattern, ignore.case=T, mydf$subj.2)
grepl(pattern, ignore.case=T, mydf$subj.3)
subDF <- panel[filter,]
mylist[[i]] <- subDF
}
but there's the error:
Error in grepl(pattern, ignore.case = T, panel$SUBJECT.1) :
invalid regular expression 'C++ PROGRAMMING', reason 'Invalid use of repetition operators'
EDIT2: oh I see, in the original data.frame, one of the subjects is "C++ PROGRAMMING". Might that "++" cause the error?