I have many sentences , though i'd create a function that would operate on each sentence individually. so the input is just a string. My main objective is to extract the words that follow prepositions like in "near blue meadows" i'd want blue meadows to be extracted.
I have all my prepositions in a text file. it works fine but i guess there's a problem in the regex used . here's my code:
import re
with open("Input.txt") as f:
words = "|".join(line.rstrip() for line in f)
pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words))
text3 = "003 canopy grace appt, classic royale garden, hennur main road, bangalore 43. near hennur police station"
print(pattern.search(text3).group())
This returns :
AttributeError Traceback (most recent call last)
<ipython-input-83-be0cdffb436b> in <module>()
5 pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words))
6 text3 = ""
----> 7 print(pattern.search(text3).group())
AttributeError: 'NoneType' object has no attribute 'group
The main problem is with regex , my expected output is "hennur police" i.e 2 words after near . In my code I have used ({}) to match from the list of preps, \s followed by space , (\d+\w+|\w+) followed by words like 19th or hennur , \s\w+ followed by a space and a word. My regex fails to match , hence the None error.
Why is it not working?
The content of the Input.txt file:
['near','nr','opp','opposite','behind','towards','above','off']
Expected output:
hennur police
words.near hennur police), so you'll indeed need to double checkInput.txtis correct (one word per line)."['near','nr','opp','opposite','behind','towards','above','off']"or['near','nr','opp','opposite','behind','towards','above','off']? (surrounded quotes or not)