1

I have two df in R (meta=some redundant info)

df1:

                id  value1  value2  value3  value4
id1_meta_meta-meta  4.93    13.93   16.8    35.39
id2_meta_meta-meta  28.63   45.43   30.52   61.71
id3_meta_meta-meta  3.35    1.26    7.98    4.43
id4_meta_meta-meta  16.78   50.47   32.48   55.52
id5_meta_meta-meta  474.23  807.71  664.45  442.55
id6_meta_meta-meta  26.26   32.83   24.64   41.58
id7_meta_meta-meta  230.1   202.93  166.71  295.48
id8_meta_meta-meta  651.21  1282.71 1012.28 2650.21

df2:

V1
id1
id2
id3
id4
id5

Question

Trying to filter rows in df1 based on ids in df2

Code

library(dplyr)
library(stringr)
df.common = df1 %>%
  filter(str_detect(id, '*_') %in% df2$V1)

error

Error in filter_impl(.data, quo) : 
  Evaluation error: Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX).

Desired output

df.common:

                id  value1  value2  value3  value4
id1_meta_meta-meta  4.93    13.93   16.8    35.39
id2_meta_meta-meta  28.63   45.43   30.52   61.71
id3_meta_meta-meta  3.35    1.26    7.98    4.43
id4_meta_meta-meta  16.78   50.47   32.48   55.52
id5_meta_meta-meta  474.23  807.71  664.45  442.55
5
  • Your original code will work if you change the filter condition to filter(str_detect(id, df2$V1)) Commented Aug 17, 2017 at 16:24
  • @JakeKaupp I get this error Warning message: In stri_detect_regex(string, pattern, opts_regex = opts(pattern)) : longer object length is not a multiple of shorter object length Commented Aug 17, 2017 at 16:27
  • It's a warning, not an error, and results in your desired output. Commented Aug 17, 2017 at 16:33
  • true, rookie mistake apologies but i do not get what I expected > dim(df.common) [1] 2 13 Commented Aug 17, 2017 at 16:36
  • 1
    str_detect detects strings and returns TRUE of FALSE, so your code is looking for TRUE or FALSE in df2. Instead, use str_extract to pull out the ID part and then test with that: str_extract(id, "id[0-9]+") %in% df2$V1. Commented Aug 17, 2017 at 16:46

2 Answers 2

4

If you are using dplyr and stringr, you can also consider this approach. str_replace_all is like gsub. semi_join is a kind of "filter-join" allowing you to keep records only found match in df2.

library(dplyr)
library(stringr)

df3 <- df1 %>%
  mutate(id2 = str_replace_all(id, "_.*", "")) %>%
  semi_join(df2, by = c("id2" = "V1")) %>%
  select(-id2)

df3
                  id value1 value2 value3 value4
1 id1_meta_meta-meta   4.93  13.93  16.80  35.39
2 id2_meta_meta-meta  28.63  45.43  30.52  61.71
3 id3_meta_meta-meta   3.35   1.26   7.98   4.43
4 id4_meta_meta-meta  16.78  50.47  32.48  55.52
5 id5_meta_meta-meta 474.23 807.71 664.45 442.55
Sign up to request clarification or add additional context in comments.

3 Comments

I will try this, but correct me if I am wrong @PoGibas answer is one liner and concise.
Well... if you only want to see the most concise answer, I will delete my answer shortly. If you want to learn more about the use of dplyr and stringr since you are using these packages, I will keep my answer here as an optional approach. What do you say?
Sure I have accepted it absolutely your are correct it can be optional way.
2
  1. Use gsub to trim id in df1

    • gsub("_.*", "", df1$id) will remove everything after _
  2. Check what trimmed id's are in df2$V2 (this will return row numbers)

  3. Extract those rows from df1

    df1[gsub("_.*", "", df1$id) %in% df2$V2, ]
    

4 Comments

It worked, could you comment on whats on going will help to learn, thanks
Awesome, appreciate it!
@sbradbio if this is what you wanted you can accept my answer then
I cannot now, SO will let me accept it after 5 mins, dunno why

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.