Python regex search

Question

My previous example was not clear, I give another example :

a = '123 - 48 <!-- 456 - 251 - --> 452 - 348'

And if i do something like :

[el for el in re.split(r' - ',a)]

I catch :

['123', '48 <!-- 456', '251', '--> 452', '348']

But I want this :

['123', '48 <!-- 456 - 251 - --> 452', '348']

Thanks...

you do(get that result...), with which python version? From my experience the el is of the type string in array comprehensions as opposed to using dict(....) — Lorenz Lo Sauer
– Lorenz Lo Sauer, Commented Oct 3, 2011 at 16:53
Ok@Update. Still, I consider non-capturing groups with filter one of the fastest solutions, especially for longer text. (Don't forget to pick an answer.) — Lorenz Lo Sauer
– Lorenz Lo Sauer, Commented Oct 4, 2011 at 12:25

fardjad · Accepted Answer · 2011-10-03 16:53:05Z

5

First remove the comments using something like this:

re.sub("<!--.*?-->", "", your_string)

then use your regex to extract numbers.

You can also use ?!... (negative lookahead assertion) but that won't be so simple.

edited Oct 3, 2011 at 16:53

answered Oct 3, 2011 at 16:47

fardjad

20.5k6 gold badges55 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Qtax · Accepted Answer · 2011-10-03 16:54:03Z

0

If you want one regex you could use something like:

(\d+)(?!(?:[^<]+|<(?!!--))*-->)

As long as there are no "invalid" -->.

It matches numbers not followed by -->, without <!-- in between.

answered Oct 3, 2011 at 16:54

Qtax

34k9 gold badges92 silver badges127 bronze badges

2 Comments

Lorenz Lo Sauer Over a year ago

it's incredibly slow (python 2.7) even for a strlen ~100. But it works

Qtax Over a year ago

If it supports atomic groups or possessive quantifiers you could try (\d+)(?!(?:[^<-]++|<(?!!--)|-(?!->))*+-->)

Lorenz Lo Sauer · Accepted Answer · 2011-10-03 17:05:46Z

-1

The result you posted is of re.findall('(\d+)',a);

re.findall('(?:\<\!--.+\d+.+--\>)|(\d+)',a)

['123', '48', '', '452', '348']

filter(None, re.findall('(?:\<\!--.+\d+.+--\>)|(\d+)',a))

['123', '48', '452', '348']

edited Oct 3, 2011 at 17:05

answered Oct 3, 2011 at 16:59

Lorenz Lo Sauer

24.9k16 gold badges89 silver badges87 bronze badges

1 Comment

Qtax Over a year ago

1 -- 2 -- 3, 1  3, couple of examples that would not work.

Collectives™ on Stack Overflow

Python regex search

3 Answers 3

Comments

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related