Data
I have a data frame with a single column consisting of strings in R.
data <- structure(list(col = c("byr:1985 eyr:2021 iyr:2011 hgt:175cm pid:163069444 hcl:#18171d",
"eyr:2023 hcl:#cfa07d ecl:blu hgt:169cm pid:494407412 byr:1936",
"ecl:zzz eyr:2036 hgt:109 hcl:#623a2f iyr:1997 byr:2029 cid:169 pid:170290956",
"hcl:#18171d ecl:oth pid:266824158 hgt:168cm byr:1992 eyr:2021",
"byr:1932 ecl:hzl pid:284313291 iyr:2017 hcl:#efcc98 eyr:2024 hgt:184cm"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
Problem
I want to filter this data frame on the rows that contain the following patterns/fields:
fields <- c("ecl", "eyr", "hgt", "hcl", "iyr", "byr", "pid")
In other words, I would like to obtain the rows that do contain each of these fields.
Attempt
The stringr package and str_detect function seemed to be the solution! So, I tested it on a single case:
> data$col[1]
[1] "byr:1985 eyr:2021 iyr:2011 hgt:175cm pid:163069444 hcl:#18171d"
> str_detect(data$col[1], fields)
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE
> all(str_detect(data$col[1], fields))
[1] FALSE
This works! If any of the fields are not present in the string, it is evaluated as false.
However, when trying to filter the rows using this option:
data %>%
filter( all(str_detect(col, fields)) )
I end up with an empty data frame, and a warning:
Warning message: In stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) : longer object length is not a multiple of shorter object length
Question(s)
- What is causing this warning?
- How do you filter a column of strings on the occurrence of multiple patterns in R?