Regex with customized word boundaries in Python

Question

I'm using a function called findlist to return a list of all the positions of a certain string within a text, with regex to look for word boundaries. But I want to ignore the character ( and only consider the other word boundaries, so that it will find split in var split but not in split(a). Is there any way to do this?

import re

def findlist(input, place):
    return [m.span() for m in re.finditer(input, place)]

str = '''
var a = 'a b c'
var split = a.split(' ')
'''
instances = findlist(r"\b%s\b" % ('split'), str)

print(instances)

Wiktor Stribiżew · Accepted Answer · 2019-02-05 12:51:01Z

2

You may check if there is a ( after the trailing word boundary with a negative lookahead (?!\():

instances = findlist(r"\b{}\b(?!\()".format('split'), s)
                             ^^^^^^

The (?!\() will trigger after the whole word is found, and if there is a ( immediately to the right of the found word, the match will be failed.

See the Python demo:

import re

def findlist(input_data, place):
    return [m.span() for m in re.finditer(input_data, place)]

s = '''
var a = 'a b c'
var split = a.split(' ')
'''
instances = findlist(r"\b{}\b(?!\()".format('split'), s)

print(instances) # => [(21, 26)]

edited Feb 5, 2019 at 12:51

answered Feb 5, 2019 at 10:30

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rob Kwasowski Over a year ago

Is there a way to only match if there is a following (?

Wiktor Stribiżew Over a year ago

@RobKwasowski Turn the negative lookahead into a positive one, (?=\().

Collectives™ on Stack Overflow

Regex with customized word boundaries in Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related