2

I am trying to select rows from a data frame based on the content of one of the columns. I am using grep() but when trying to match for the end of the content it only matches the last pattern that was submitted.

This is the code:

df1 <- data.frame(cName=c(
    'A Co', 'B Co',  'C Co', 'D Co', 
    'F Llc', 'G Llc', 'H Llc', 'I Llc',
    'P Inc', 'Q Inc', 'R Inc', 'S Inc'))    
tName <- grep( ("Inc$ | Llc$"),df1$cName, value = T)
tName
[1] "F Llc" "G Llc" "H Llc" "I Llc"

I am expecting it to return all the occurrences of 'Inc' and 'Llc'. However, only the last one in the regular expression is returned. I have tried various combinations with brackets, parenthesis and [:space:] without success. What is wrong? Thanks for any advice.

1
  • 1
    Ditch the white spaces they have meaning. In other words, just do grep("Inc$|Llc$",df1$cName, value = TRUE). Or in case space is important, do grep(" Inc$| Llc$",df1$cName, value = TRUE) Commented Mar 29, 2015 at 20:03

1 Answer 1

1

Here is the code that works as expected:

df1 <- data.frame(cName=c(
    'A Co', 'B Co',  'C Co', 'D Co', 
    'F Llc', 'G Llc', 'H Llc', 'I Llc',
    'P Inc', 'Q Inc', 'R Inc', 'S Inc'))    
tName <- grep( ("(Inc|Llc)$"),df1$cName, value = T)
tName

Output: [1] "F Llc" "G Llc" "H Llc" "I Llc" "P Inc" "Q Inc" "R Inc" "S Inc"

The original regex did not work as expected because it required a space after a string end anchor, which is just not possible. You can see the regex explanation at regex101.com.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.