0

I have a

import re

s = """
/* comments */
vector<CWaypoint> Vparam;     // comment
int cValue=2049;                // comment
double param=0.01;              // comment
"""

exp = re.compile(r"(?:\s|\b)(.+?)=(.+?)\b;")
print(exp.findall(s))

My expected output is

[(cValue,2049), (param,0.01)]

but why am I getting the data type before the variable name like below

[('int cValue', '2049'), ('double param', '0.01')]

Why isn't boundaries working even if they are non greedy

5
  • 2
    Regex scans from left to right, so they match. You can change your regex from (.+?)= to (\w+)= to get what you want. (Though I don't think parsing code with regex is a good idea). Commented Jun 18, 2015 at 12:21
  • What if I have a special character in the variable name? Commented Jun 18, 2015 at 12:23
  • There is the option of using (\S+), which matches non-space characters (again, this is very poor approximation, and you should consult the documentation for the exact character set). Commented Jun 18, 2015 at 12:24
  • Whats with "Left to right" ? I mean how does . makes so much of a difference? Commented Jun 18, 2015 at 12:26
  • 1
    Regex engine scans from left to right, so it will find the left most match first, and since \b.+?=.+?\b; matches int cValue=2049, it is returned as a match. \S+ or \w+ basically forces the content before = to contain no space, so you only get the variable name. Commented Jun 18, 2015 at 12:29

1 Answer 1

1

Note that the \s will also match newline characters and .+ will match any char including spaces.

I suggest you to use [^\s=]+ before =

exp = re.compile(r"([^\s=]+)=(.+?);")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.