Using regex assertion in python

Question

I am experimenting with regex and i have read up on assertions a bit and seen examples but for some reason I can not get this to work.. I am trying to get the word after the following pattern using look-behind.

import re
s = '123abc456someword 0001abde19999anotherword'
re.findall(r'(?<=\d+[a-z]+\d+)[a-z]+', s, re.I)

The results should be someword and anotherword

But i get error: look-behind requires fixed-width pattern

Any help appreciated.

It's as it says; it's expects you to give input for a fixed width of the characters, and not a dynamic one. Try using {#} instead of ? + * etc. — user1467267
– user1467267, Commented Jul 13, 2014 at 20:31
it's clear from the error that look-behind requires fixed-width pattern. — Braj
– Braj, Commented Jul 13, 2014 at 20:34

hwnd · Accepted Answer · 2014-07-13 23:05:35Z

4

Python's re module only allows fixed-length strings using look-behinds. If you want to experiment and be able to use variable length look-behinds in regexes, use the alternative regex module:

>>> import regex
>>> s = '123abc456someword 0001abde19999anotherword'
>>> regex.findall(r'(?i)(?<=\d+[a-z]+\d+)[a-z]+', s)
['someword', 'anotherword']

Or simply avoid using look-behind in general and use a capturing group ( ):

>>> import re
>>> s = '123abc456someword 0001abde19999anotherword'
>>> re.findall(r'\d+[a-z]+\d+([a-z]+)', s, re.I)
['someword', 'anotherword']

edited Jul 13, 2014 at 23:05

answered Jul 13, 2014 at 20:34

hwnd

70.9k4 gold badges100 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jackson Over a year ago

I didn't know there was a regex module, so you can do this using it? I will have to try and install this module and play around with it.

zx81 Over a year ago

+1 for the fabulous regex module (and the capture group alternative which is the general solutions for engines without lookbehind) :)

Braj · Accepted Answer · 2014-07-13 20:43:22Z

3

Convert it to Non-capturing group and get the matched group from index 1.

(?:\d+\w+\d+)(\w+\b)

here is DEMO

If you are interested in [a-z] only then change \w to [a-z] in above regex pattern. Here \b is added to assert position at a word boundary.

sample code:

import re
p = re.compile(ur'(?:\d+\w+\d+)(\w+\b)', re.IGNORECASE)
test_str = u"123abc456someword 0001abde19999anotherword"

re.findall(p, test_str)

edited Jul 13, 2014 at 20:43

answered Jul 13, 2014 at 20:37

Braj

46.9k5 gold badges63 silver badges77 bronze badges

Comments

Avinash Raj · Accepted Answer · 2014-07-14 00:49:18Z

0

Another easy method through lookahead,

>>> import re
>>> s = '123abc456someword 0001abde19999anotherword'
>>> m = re.findall(r'[a-z]+(?= |$)', s, re.I)
>>> m
['someword', 'anotherword']

It matches one or more alphabets in which the following character must be a space or end of a line.

answered Jul 14, 2014 at 0:49

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Collectives™ on Stack Overflow

Using regex assertion in python

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related