1

So given a string like this "\"turkey AND ham\" NOT \"roast beef\"" I need to get an array with the inner strings like so: ["turkey AND ham", "roast beef"] and eliminate OR's, AND's and NOT's that may or may not be there.

With the help of Rubular I came up with this regex /\\["']([^"']*)\\["']/

which returns the following 2 groups:

Match 1 1. turkey AND ham Match 2 1. roast beef

however when I use it with .scan keep getting and empty array.

I looked at this and this other SO posts, and a few others, but can not figure out where I am going wrong

Here is the result from my rails console:

=> q = "\"turkey and ham\" OR \"roast beef\"" => q.scan(/\\["']([^"']*)\\["']/) => []

Expectation: ["turkey AND ham", "roast beef"]

I shall also mention I suck at regex.

2
  • 2
    You seem to overescape the pattern. Use q.scan(/["']([^"']*)["']/). With double backslashes, you defined a literal backslash, and there is no backslash in the string returning no matches. Commented Oct 13, 2016 at 17:30
  • 1
    to expand on what @WiktorStribiżew stated your actual string is '"turkey AND ham" NOT "roast beef"' the `` are to escape the double quotes for output and the regex he posted will perform correctly Example Commented Oct 13, 2016 at 17:37

2 Answers 2

3

When the regex used with scan contains a capture group (@davidhu2000's approach), one generally can use lookarounds1 instead. It's just a matter of personal preference. To allow for double-quoted strings that contain either single- or (escaped) double-quoted strings, you could use the following regex.

r = /
    (?<=") # match a double quote in a positive lookbehind
    [^"]+  # match one or more characters that are not double-quotes
    (?=")  # match a double quote in a positive lookahead
    |      # or
    (?<=') # match a single quote in a positive lookbehind
    [^']+  # match one or more characters that are not single-quotes
    (?=')  # match a single quote in a positive lookahead
    /x    # free-spacing regex definition mode

"\"turkey AND ham\" NOT 'roast beef'".scan(r)
  #=> ["turkey AND ham", "roast beef"]

As '"turkey AND ham" NOT "roast beef"' #=> "\"turkey AND ham\" NOT \"roast beef\"" (i.e., how the single-quoted string is saved), we need not be concerned about that being an additional case to deal with.

1 For any in the audience who still consider regular expressions to be black magic, there are four kinds of lookarounds (positive and negative lookbehinds and lookaheads) as elaborated in the doc for Regexp. Sometimes they are regarded as "zero-width" matches as they are not part of the matched text.

Sign up to request clarification or add additional context in comments.

1 Comment

Elegant solution without the need to flatten any array. Thought still considering regex a sort of black magic :)
2

You regex is trying to match \, which won't match anything in the string, since the \ existed to escape the double quote, and won't be part of the string.

So if you remove \\ in your regex

res = q.scan(/["']([^"']*)["']/)

This will return a 2d array

res = [["turkey and ham"], ["roast beef"]]

Each inner array is all the matching groups from the regex, so if you have two capture groups in your regex, you will see two items in the inner array.

If you want a simple array, you can run flatten method on the array.

5 Comments

Now, the only issue OP has is to allow matching 'some "string"' and "some 'string'"
@Wiktor, since 'some "string"' #=> "some \"string\"" , I don't think that one needs attention, but yes on the other.
@CarySwoveland: I do not think this sample text has much to do with Ruby.
@Wiktor, I don't follow, but the regex works fine when the string is single-quoted: '"turkey AND ham" OR "roast beef"'.scan(/["']([^"']*)["']/) #=> [["turkey AND ham"], ["roast beef"]].
I don't know if it's a problem, but "\"hello'".scan(/["']([^"']*)["']/) => [["hello"]].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.