Python unclear regex behavior

Question

I have a problem with python regex. The result seems fine on regxe buddy, but fails in python.

Set of data i have to match is a list:

['  101  0.  0.\n',
 '  0.  100.\n',
 '  1.  98.5107805\n',
 '  2.  97.0464459\n',
 '  3.  95.6065328\n', ... ]

I have to get all the numbers starting from second line. For this i used:

pattern = compile(r'\s*(?P<raw_time>\d*\.?\d*)\s+(?P<raw_value>\d*\.\d*)')

And all worked fine. I iterate through the list and get the first value in "raw_time" and second one in "raw_value" for every line. Then i was supposed to expand the term to also work with weighted data.

So the data turned to:

['  101  0.  0.\n',
 '  0.  100.  1\n',
 '  1.  98.5107805  1\n',
 '  2.  97.0464459  1\n',
 '  3.  95.6065328  1\n', ... ]

I still just have to parse out two first parameters. So i changed the pattern to:

pattern = compile(r'\s*(?P<raw_time>\d*\.?\d*)\s+(?P<raw_value>\d*\.\d*).+')

It works fine all lines except the 1.

Working:

In [35]: pattern.search('1.  98.5107805  1\n').groupdict()
Out[35]: {'raw_time': '1.', 'raw_value': '98.5107805'}

Working:

In [37]: pattern.search('  0.  100.  1\n').groupdict()
Out[37]: {'raw_time': '0.', 'raw_value': '100.'}

Working:

In [44]: pattern.search('1. 98.5107805\n').groupdict() Out[44]: {'raw_time': '1.', 'raw_value': '98.510780'}

Not working:

In [46]: pattern.search('  0.  100.\n').groupdict()
Out[46]: {'raw_time': '', 'raw_value': '0.'}

I heavily rely on regex (sure, demo, but it was consistent with python till now).

Advice?

tnx

Please could you clearly show the input on which it's not working (like you do in the working example). Thanks. — NPE
– NPE, Commented May 25, 2012 at 7:53
Reggex buddy won't help you with Python regexes. You'll need a Python regex tester like: ksamuel.pythonanywhere.com — Bite code
– Bite code, Commented May 25, 2012 at 9:58

NPE · Accepted Answer · 2012-05-25 07:55:53Z

2

The .+ that you've added needs to be changed to .*.

The + operator requires at least one character, whereas * will accept zero or more.

answered May 25, 2012 at 7:55

NPE

503k114 gold badges970 silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python unclear regex behavior

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related