4

I am experimenting with regex and i have read up on assertions a bit and seen examples but for some reason I can not get this to work.. I am trying to get the word after the following pattern using look-behind.

import re
s = '123abc456someword 0001abde19999anotherword'
re.findall(r'(?<=\d+[a-z]+\d+)[a-z]+', s, re.I)

The results should be someword and anotherword

But i get error: look-behind requires fixed-width pattern

Any help appreciated.

6
  • It's as it says; it's expects you to give input for a fixed width of the characters, and not a dynamic one. Try using {#} instead of ? + * etc. Commented Jul 13, 2014 at 20:31
  • 1
    it's clear from the error that look-behind requires fixed-width pattern. Commented Jul 13, 2014 at 20:34
  • I see that in the documentation as I read it now.. Commented Jul 13, 2014 at 20:40
  • 1
    Now look at the solutions. Commented Jul 13, 2014 at 20:40
  • So you can't use * or + in lookbehind? Commented Jul 13, 2014 at 20:47

3 Answers 3

4

Python's re module only allows fixed-length strings using look-behinds. If you want to experiment and be able to use variable length look-behinds in regexes, use the alternative regex module:

>>> import regex
>>> s = '123abc456someword 0001abde19999anotherword'
>>> regex.findall(r'(?i)(?<=\d+[a-z]+\d+)[a-z]+', s)
['someword', 'anotherword']

Or simply avoid using look-behind in general and use a capturing group ( ):

>>> import re
>>> s = '123abc456someword 0001abde19999anotherword'
>>> re.findall(r'\d+[a-z]+\d+([a-z]+)', s, re.I)
['someword', 'anotherword']
Sign up to request clarification or add additional context in comments.

2 Comments

I didn't know there was a regex module, so you can do this using it? I will have to try and install this module and play around with it.
+1 for the fabulous regex module (and the capture group alternative which is the general solutions for engines without lookbehind) :)
3

Convert it to Non-capturing group and get the matched group from index 1.

(?:\d+\w+\d+)(\w+\b)

here is DEMO

If you are interested in [a-z] only then change \w to [a-z] in above regex pattern. Here \b is added to assert position at a word boundary.

sample code:

import re
p = re.compile(ur'(?:\d+\w+\d+)(\w+\b)', re.IGNORECASE)
test_str = u"123abc456someword 0001abde19999anotherword"

re.findall(p, test_str)

Comments

0

Another easy method through lookahead,

>>> import re
>>> s = '123abc456someword 0001abde19999anotherword'
>>> m = re.findall(r'[a-z]+(?= |$)', s, re.I)
>>> m
['someword', 'anotherword']

It matches one or more alphabets in which the following character must be a space or end of a line.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.