How to find string with spaces between two characters using Regex?

Question

Currently I have a string that I want to parse and pick up certain values.

The current regex findall pattern that I have is:

re.findall(r'(?P<key>\w+)\s+(?P<value>\w+)')

With this regex findall pattern I can pick up the key and values of the following:

--key1=value1 --key2=value2

But if the value is a string with spaces, it doesn't pick it up. Examples that doesn't work:

--key1=this is value 1 --key2=value2
--key1=only kvp
--key1=this/doesnt/work/

How can I adjust the regex pattern to pick up the string after the = sign?

Could You provide a minimal working example for what You have tried? re.findall(r'(?P<key>\w+)\s+(?P<key>\w+)') doesn't work for at least two reasons (one missing argument, redefinition of capture group key). — Lord Bo
– Lord Bo, Commented Aug 14, 2022 at 20:09
can you edit your code with the output you want for your sample input? — Sharim09
– Sharim09, Commented Aug 14, 2022 at 20:11
Are those command line arguments? They usually escape whitespaces or quote the string. — Piotr Siupa
– Piotr Siupa, Commented Aug 14, 2022 at 20:26
The output I am looking for is a list of the found key, value pairs. So using the example: ['key1', 'value1'], ['key2', 'value2'] — chrisans
– chrisans, Commented Aug 14, 2022 at 20:45

Piotr Siupa · Accepted Answer · 2022-08-14 21:17:52Z

I started by changing your regex to --(?P<key>\w+)=(?P<value>\w+). This way, it uses "=" instead of a whitespace as a separator between key and value. It also requires "--" to precede the key, which seems to be a rule in your data.

Now let tackle the main problem which is to capture as a value everything after the "=" sign unless it is the next key.

This can be done in three steps:

Change regex for the value from \w+ to .+. You want to capture all characters so you cannot limit yourself to just \w. . will capture everything. Of course this change caused a new problem: the value will now contain everything that follows the key, even if it is "value1 --key2=value2". This will be fixed in the remaining two steps.
The next step is to make the regex non-greedy. Change the regex for value from .+ to .+? and it will capture the least characters it can instead of the most. This still doesn't solve the problem because the regex will capture only one character of the value. We are a step closer, though.
The last step is to prevent the regex from stopping capturing the value until it encounter the next key or the end of the string. Add (?=$|\s--) at the end. (?=) is a positive lookahead. It means that the next part must follow the current position but it is not part of the match itself. $|\s-- is an alternation of either end of the string or a whitespace and two dashes.

The finished regex is:

re.findall(r'--(?P<key>\w+)=(?P<value>.+?)(?=$|\s--)', string)

It should handle everything other than a value that contains --. For example:

import re
string = "--key1=value 1 has--really .:weird:. characters --key2=value2"
result = re.findall(r'--(?P<key>\w+)=(?P<value>.+?)(?=$|\s--)', string)
print(result)

gives:

[('key1', 'value 1 has--really .:weird:. characters'), ('key2', 'value2')]

Thank you for breaking it down like this, made me understand this a lot faster! This solves it

Collectives™ on Stack Overflow

How to find string with spaces between two characters using Regex?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related