2

I'm trying to make a Regex to capture command flags and values. So for example, given the string:

helloworld --name=stretch --message="Hi there everyone"

It should capture name and stretch, and then message and Hi there everyone.

So I've almost got what I need:

\--([a-zA-Z]+)=[\"\']*([^\"\s\'\\]*(?:\\.[^\\\'\"]*)*)\g

But I'm having issues with the space ... If I take it out, it only works properly with quoted values, and with it in it only works with unquoted strings Lol ...

Here's the regex101: https://regex101.com/r/eE1zP6/2

4
  • Is there a possibility to escape the quote inside a quoted string? If not, I guess --([a-zA-Z]+)=([^"'\s]+|"[^"]*"|'[^']*') should do it. Commented Mar 12, 2016 at 20:39
  • Yeah I was trying to make it so that is accepts escaped quotes too Commented Mar 12, 2016 at 20:42
  • Sorry for asking, but how do you escape quotes? Commented Mar 12, 2016 at 20:45
  • "my name is \"chris\" :D" Commented Mar 12, 2016 at 20:46

4 Answers 4

4

If it's suitable, you could capture the quoted and unquoted messages in different groups:

--(\w+)=(?:[\"\']([^\"\'\\]*(?:\\.[^\\\'\"]*)*)[\"\']|(\w+))

Then in your code you can check if it's a quoted (group 2) or unquoted (group 3).

Sign up to request clarification or add additional context in comments.

4 Comments

@Stretch can you edit your question to include that requirement?
It's in my first regex ... But yeah okay
@Stretch edited my answer, now it should work with escaped quotes :) regex101.com/r/mN1wG0/1
Cool that should work :D Thanks ... I slightly edited it, so that if it's not in quotes, you can still have all characters except space :D - --(\w+)=(?:[\"\']([^\"\'\\]*(?:\\.[^\\\'\"]*)*)[\"\']|([^ ]*))
2

Instead of juggling around with quotes, take a far superior approach: use a conditional regex.
The basic form is as follows:

(?(1)foo|bar)
# Meaning: if group1 is set, use foo, otherwise bar as subpattern

For your requirements, this comes down to:

--(?P<key>\w+)=(")?(?P<value>(?(2)[^"]+|[^\s]+))

In PHP code and with explanation, this looks even more beautiful:

<?php
$string = 'helloworld --name=stretch --message="Hi there everyone"';
$regex = '~
            --(?P<key>\w+)=         # look for two dashes, capture every word character into the group "key"
            (")?                    # look for double quotes and make the group (2) optional
            (?P<value>              # save the following to the group "value"
                (?(2)[^"]+|[^\s]+)  # if (2) is set, capture everything BUT a double quote
                                    # else capture everything but a space (not allowed without quotes)
            )
            ~x';                    # verbose modifier
preg_match_all($regex, $string, $matches, PREG_SET_ORDER);
foreach ($matches as $match)
    echo "Key: {$match['key']}, Value: {$match['value']}\n";
/* output:
Key: name, Value: stretch
Key: message, Value: Hi there everyone
*/    
?>

See a demo for this one on ideone.com.

You can even go further and allow single quotes as delimiter and have escaped quotes in your values like so:

--(?P<key>\w+)= 
(['"])?                   # allow single or double quotes
(?P<value>       
    (?(2).+?(?<!\\)(?=\2) # if (2) is set, match everything lazily afterwards
                          # and make sure that what follows is the formerly captured quote
                          # make also sure that what precedes, is not a backslash (thus allowing escaped quotes)
    |[^\s]+)
)

See this demo on regex101.com (hijacked from @SebastianProske, sorry mate :).

6 Comments

this doesn't take into account single quotes and escaped quotes (though i like the approach)
@SebastianProske: Look at the updated answer at the bottom and your example changed with it (hijacked it) :)
taking a few steps more, but looking much prettier - so i will let my answer stay and upvote yours
@SebastianProske Thanks.
Very cool thanks. I ended up using this one Lol. Although I did have to edit it slightly to get it to work in php (adding a few extra backslashes): $pattern='/--(?P<key>\w+)=([\'"])?(?P<value>(?(2).+?(?<!\\\\)(?=\2)|[^\s]+))/'
|
1

My approach would be the following:

--([a-zA-Z]+)=([^"'\s]+|"(?:[^"\\]|\\.)*"|'(?:[^'\\]|\\.)*')

The start is quite simple: --([a-zA-Z]+)= matches double - followed by letters and =, capturing the letters in a group. Then we have 3 alternatives, without quotes it is [^"'\s]+ matching everything thats not a quote or space (you might remove the quotes if they are allowed inside the value. "(?:[^"\\]|\\.)*" is looking for a double quote followed by any amount of non-double quotes or \ followed by anything until there is an double quote that is not eaten by the \.. '(?:[^'\\]|\\.)*' is doing the same for single quotes. This allows the (in my opinion correct) mixing of quotes as shown in the last line of my example.

https://regex101.com/r/gE1hG6/2

1 Comment

Plus one for hijacking your regex example :)
0

If not using IF conditional match, you could try:

--(\w+)=(?:('|")(.*?)(?<!\\)\2|(\S+))

DEMO HERE

1 Comment

('|") is the same as ['"], also \S+ is exactly the same as [^\s]+. Additionally, more steps are needed with this approach (because of the .*?).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.