1

I'm trying to find any occurunce of "fiction" preceeded or followed by anything, except for "non-"

I tried :

.*[^(n-)]fiction.*

but it's not working as I want it to. Can anyone help me out?

example

2
  • 1
    you want a negative lookbehind. Something like [[^\s]*(?<!n-)fiction which will match any number of non white space characters prior to finding the word fiction unless--imediately prior to finding the word fiction--if finds the characters "n-" Commented Mar 6, 2021 at 17:53
  • Thank you! that's exactly what I needed. Commented Mar 6, 2021 at 17:56

3 Answers 3

2

Check if this works for you:

.*(?<!non\-)fiction.*
Sign up to request clarification or add additional context in comments.

Comments

2

You should avoid patterns starting with .*: they cause too many backtracking steps and slow down the code execution.

In Python, you may always get lines either by reading a file line by line, or by splitting a line with splitlines() and then get the necessary lines by testing them against a pattern without .*s.

  1. Reading a file line by line:
final_output = []
with open(filepath, 'r', newline="\n", encoding="utf8") as f:
  for line in f:
    if "fiction" in line and "non-fiction" not in line:
      final_output.append(line.strip())

Or, getting the lines even with non-fiction if there is fiction with no non- in front using a bit modified @jlesuffleur's regex:

import re
final_output = []
rx = re.compile(r'\b(?<!non-)fiction\b')
with open(filepath, 'r', newline="\n", encoding="utf8") as f:
  for line in f:
    if rx.search(line):
      final_output.append(line.strip())
  1. Getting lines from a multiline string (with both approaches mentioned above):
import re
text = "Your input string line 1\nLine 2 with fiction\nLine 3 with non-fiction\nLine 4 with fiction and non-fiction"
rx = re.compile(r'\b(?<!non-)fiction\b')
# Approach with regex returning any line containing fiction with no non- prefix:
final_output = [line.strip() for line in text.splitlines() if rx.search(line)]
# => ['Line 2 with fiction']
# Non-regex approach that does not return lines that may contain non-fiction (if they contain fiction with no non- prefix):
final_output = [line.strip() for line in text.splitlines() if "fiction" in line and "non-fiction" not in line]
# => ['Line 2 with fiction', 'Line 4 with fiction and non-fiction']

See a Python demo.

1 Comment

Thank you for this detailed answer, much appreciated!
1

What about a negative lookbehind?

s = 'fiction non-fiction'
res = re.findall("(?<!non-)fiction", s)
res

1 Comment

Thank you, it works but it only exctracts the word "fiction" even if it's proceeded by non-

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.