Regex to exclude a specific pattern python

Question

I'm trying to find any occurunce of "fiction" preceeded or followed by anything, except for "non-"

I tried :

.*[^(n-)]fiction.*

but it's not working as I want it to. Can anyone help me out?

you want a negative lookbehind. Something like [[^\s]*(?<!n-)fiction which will match any number of non white space characters prior to finding the word fiction unless--imediately prior to finding the word fiction--if finds the characters "n-" — born_naked
– born_naked, Commented Mar 6, 2021 at 17:53

Cute Panda · Accepted Answer · 2021-03-06 18:00:24Z

2

Check if this works for you:

.*(?<!non\-)fiction.*

answered Mar 6, 2021 at 18:00

Cute Panda

1,4981 gold badge9 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Wiktor Stribiżew · Accepted Answer · 2021-03-06 19:00:11Z

You should avoid patterns starting with .*: they cause too many backtracking steps and slow down the code execution.

In Python, you may always get lines either by reading a file line by line, or by splitting a line with splitlines() and then get the necessary lines by testing them against a pattern without .*s.

Reading a file line by line:

final_output = []
with open(filepath, 'r', newline="\n", encoding="utf8") as f:
  for line in f:
    if "fiction" in line and "non-fiction" not in line:
      final_output.append(line.strip())

Or, getting the lines even with non-fiction if there is fiction with no non- in front using a bit modified @jlesuffleur's regex:

import re
final_output = []
rx = re.compile(r'\b(?<!non-)fiction\b')
with open(filepath, 'r', newline="\n", encoding="utf8") as f:
  for line in f:
    if rx.search(line):
      final_output.append(line.strip())

Getting lines from a multiline string (with both approaches mentioned above):

import re
text = "Your input string line 1\nLine 2 with fiction\nLine 3 with non-fiction\nLine 4 with fiction and non-fiction"
rx = re.compile(r'\b(?<!non-)fiction\b')
# Approach with regex returning any line containing fiction with no non- prefix:
final_output = [line.strip() for line in text.splitlines() if rx.search(line)]
# => ['Line 2 with fiction']
# Non-regex approach that does not return lines that may contain non-fiction (if they contain fiction with no non- prefix):
final_output = [line.strip() for line in text.splitlines() if "fiction" in line and "non-fiction" not in line]
# => ['Line 2 with fiction', 'Line 4 with fiction and non-fiction']

See a Python demo.

jlesuffleur · Accepted Answer · 2021-03-06 17:54:58Z

1

What about a negative lookbehind?

s = 'fiction non-fiction'
res = re.findall("(?<!non-)fiction", s)
res

answered Mar 6, 2021 at 17:54

jlesuffleur

1,3031 gold badge9 silver badges20 bronze badges

1 Comment

Atheer Over a year ago

Thank you, it works but it only exctracts the word "fiction" even if it's proceeded by non-

Collectives™ on Stack Overflow

Regex to exclude a specific pattern python

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related