4

I'm trying to write a regex to replace all spaces that are not included in quotes so something like this:

a = 4, b = 2, c = "space here"

would return this:

a=4,b=2,c="space here"

I spent some time searching this site and I found a similar q/a ( Split a string by spaces -- preserving quoted substrings -- in Python ) that would replace all the spaces inside quotes with a token that could be re-substituted in after wiping all the other spaces...but I was hoping there was a cleaner way of doing it.

4 Answers 4

8

It's worth noting that any regular expression solution will fail in cases like the following:

a = 4, b = 2, c = "space" here"

While it is true that you could construct a regexp to handle the three-quote case specifically, you cannot solve the problem in the general sense. This is a mathematically provable limitation of simple DFAs, of which regexps are a direct representation. To perform any serious brace/quote matching, you will need the more powerful pushdown automaton, usually in the form of a text parser library (ANTLR, Bison, Parsec).

With that said, it sounds like regular expressions should be sufficient for your needs. Just be aware of the limitations.

Sign up to request clarification or add additional context in comments.

1 Comment

The space between double-quote and 'here' is NOT in quotes in your example.
5

This seems to work:

result = string.gsub(/( |(".*?"))/, "\\2")

1 Comment

if you get into single- and double-quoted strings, you need to match opening and closing quote marks
2

I consider this very clean:

mystring.scan(/((".*?")|([^ ]))/).map { |x| x[0] }.join

I doubt gsub could do any better (assuming you want a pure regex approach).

Comments

0

try this one, string in single/double quoter is also matched (so you need to filter them, if you only need space):

/( |("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.