dplyr filter columns with multiple regex

Question

I have two df in R (meta=some redundant info)

df1:

                id  value1  value2  value3  value4
id1_meta_meta-meta  4.93    13.93   16.8    35.39
id2_meta_meta-meta  28.63   45.43   30.52   61.71
id3_meta_meta-meta  3.35    1.26    7.98    4.43
id4_meta_meta-meta  16.78   50.47   32.48   55.52
id5_meta_meta-meta  474.23  807.71  664.45  442.55
id6_meta_meta-meta  26.26   32.83   24.64   41.58
id7_meta_meta-meta  230.1   202.93  166.71  295.48
id8_meta_meta-meta  651.21  1282.71 1012.28 2650.21

df2:

V1
id1
id2
id3
id4
id5

Question

Trying to filter rows in df1 based on ids in df2

Code

library(dplyr)
library(stringr)
df.common = df1 %>%
  filter(str_detect(id, '*_') %in% df2$V1)

error

Error in filter_impl(.data, quo) : 
  Evaluation error: Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX).

Desired output

df.common:

                id  value1  value2  value3  value4
id1_meta_meta-meta  4.93    13.93   16.8    35.39
id2_meta_meta-meta  28.63   45.43   30.52   61.71
id3_meta_meta-meta  3.35    1.26    7.98    4.43
id4_meta_meta-meta  16.78   50.47   32.48   55.52
id5_meta_meta-meta  474.23  807.71  664.45  442.55

Your original code will work if you change the filter condition to filter(str_detect(id, df2$V1)) — Jake Kaupp
– Jake Kaupp, Commented Aug 17, 2017 at 16:24
@JakeKaupp I get this error Warning message: In stri_detect_regex(string, pattern, opts_regex = opts(pattern)) : longer object length is not a multiple of shorter object length — sbradbio
– sbradbio, Commented Aug 17, 2017 at 16:27
It's a warning, not an error, and results in your desired output. — Jake Kaupp
– Jake Kaupp, Commented Aug 17, 2017 at 16:33
true, rookie mistake apologies but i do not get what I expected > dim(df.common) [1] 2 13 — sbradbio
– sbradbio, Commented Aug 17, 2017 at 16:36
str_detect detects strings and returns TRUE of FALSE, so your code is looking for TRUE or FALSE in df2. Instead, use str_extract to pull out the ID part and then test with that: str_extract(id, "id[0-9]+") %in% df2$V1. — Gregor Thomas
– Gregor Thomas, Commented Aug 17, 2017 at 16:46

www · Accepted Answer · 2017-08-17 16:27:19Z

4

If you are using dplyr and stringr, you can also consider this approach. str_replace_all is like gsub. semi_join is a kind of "filter-join" allowing you to keep records only found match in df2.

library(dplyr)
library(stringr)

df3 <- df1 %>%
  mutate(id2 = str_replace_all(id, "_.*", "")) %>%
  semi_join(df2, by = c("id2" = "V1")) %>%
  select(-id2)

df3
                  id value1 value2 value3 value4
1 id1_meta_meta-meta   4.93  13.93  16.80  35.39
2 id2_meta_meta-meta  28.63  45.43  30.52  61.71
3 id3_meta_meta-meta   3.35   1.26   7.98   4.43
4 id4_meta_meta-meta  16.78  50.47  32.48  55.52
5 id5_meta_meta-meta 474.23 807.71 664.45 442.55

answered Aug 17, 2017 at 16:27

www

39.3k12 gold badges52 silver badges93 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

sbradbio Over a year ago

I will try this, but correct me if I am wrong @PoGibas answer is one liner and concise.

www Over a year ago

Well... if you only want to see the most concise answer, I will delete my answer shortly. If you want to learn more about the use of dplyr and stringr since you are using these packages, I will keep my answer here as an optional approach. What do you say?

sbradbio Over a year ago

Sure I have accepted it absolutely your are correct it can be optional way.

pogibas · Accepted Answer · 2017-08-17 16:14:45Z

2

Use gsub to trim id in df1
- gsub("_.*", "", df1$id) will remove everything after _
Check what trimmed id's are in df2$V2 (this will return row numbers)

Extract those rows from df1

df1[gsub("_.*", "", df1$id) %in% df2$V2, ]

edited Aug 17, 2017 at 16:14

answered Aug 17, 2017 at 16:10

pogibas

28.5k21 gold badges92 silver badges120 bronze badges

4 Comments

sbradbio Over a year ago

It worked, could you comment on whats on going will help to learn, thanks

sbradbio Over a year ago

Awesome, appreciate it!

pogibas Over a year ago

@sbradbio if this is what you wanted you can accept my answer then

sbradbio Over a year ago

I cannot now, SO will let me accept it after 5 mins, dunno why

Collectives™ on Stack Overflow

dplyr filter columns with multiple regex

Question

Code

error

Desired output

2 Answers 2

3 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Question

Code

error

Desired output

2 Answers 2

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related