1

I have many sentences , though i'd create a function that would operate on each sentence individually. so the input is just a string. My main objective is to extract the words that follow prepositions like in "near blue meadows" i'd want blue meadows to be extracted.
I have all my prepositions in a text file. it works fine but i guess there's a problem in the regex used . here's my code: import re

with open("Input.txt") as f:
    words = "|".join(line.rstrip() for line in f)
    pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words))
    text3 = "003 canopy grace appt, classic royale garden, hennur main road, bangalore 43. near hennur police station"
    print(pattern.search(text3).group())

This returns :

AttributeError                            Traceback (most recent call last)
<ipython-input-83-be0cdffb436b> in <module>()
      5     pattern = re.compile('({})\s(\d+\w+|\w+)\s\w+'.format(words))
      6     text3 = ""
----> 7     print(pattern.search(text3).group())

AttributeError: 'NoneType' object has no attribute 'group

The main problem is with regex , my expected output is "hennur police" i.e 2 words after near . In my code I have used ({}) to match from the list of preps, \s followed by space , (\d+\w+|\w+) followed by words like 19th or hennur , \s\w+ followed by a space and a word. My regex fails to match , hence the None error. Why is it not working?

The content of the Input.txt file:

['near','nr','opp','opposite','behind','towards','above','off']

Expected output:

hennur police
5
  • You need to check what exactly is in words. Commented Feb 27, 2014 at 7:06
  • Works for me (though you actually should get near hennur police), so you'll indeed need to double check Input.txt is correct (one word per line). Commented Feb 27, 2014 at 7:10
  • input.txt is of the form ['near','off','opposite'...] and so on.. i've edited my question. check it. Commented Feb 27, 2014 at 7:28
  • Is the content of file "['near','nr','opp','opposite','behind','towards','above','off']" or ['near','nr','opp','opposite','behind','towards','above','off'] ? (surrounded quotes or not) Commented Feb 27, 2014 at 7:34
  • input file is without quotes.. and the variable called words has double quotes Commented Feb 27, 2014 at 7:37

1 Answer 1

1

The file contains Python list literal. Use ast.literal to parse the literal.

>>> import ast
>>> ast.literal_eval("['near','nr','opp','opposite','behind','towards','above','off']")
['near', 'nr', 'opp', 'opposite', 'behind', 'towards', 'above', 'off']

import ast
import re

with open("Input.txt") as f:
    words = '|'.join(ast.literal_eval(f.read()))
    pattern = re.compile('(?:{})\s(\d*\w+\s\w+)'.format(words))
    text3 = "003 canopy grace appt, classic royale garden, hennur main road, bangalore 43. near hennur police station"

    # If there could be multiple matches, use `findall` or `finditer`
    #   `findall` returns a list of list if there's capturing group instead of
    #   entire matched string.
    for place in pattern.findall(text3):
        print(place)

    # If you want to get only the first match, use `search`.
    #   You need to use `group(1)` to get only group 1.
    print pattern.search(text3).group(1)

output (The first line is printed in for loop, the second one come from search(..).group(1)):

hennur police
hennur police

NOTE you need to re.escape each word if there's any special character in the word that has special meaning in regular expression.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.