Regex: AttributeError: 'NoneType' object has no attribute 'groups'

Question

I have a string which I want to extract a subset of. This is part of a larger Python script.

This is the string:

import re

htmlString = '</dd><dt> Fine, thank you.&#160;</dt><dd> Molt bé, gràcies. (<i>mohl behh, GRAH-syuhs</i>)'

Which I want to pull-out "Molt bé, gràcies. mohl behh, GRAH-syuhs". And for that I use regular expression using re.search:

SearchStr = '(\<\/dd\>\<dt\>)+ ([\w+\,\.\s]+)([\&\#\d\;]+)(\<\/dt\>\<dd\>)+ ([\w\,\s\w\s\w\?\!\.]+) (\(\<i\>)([\w\s\,\-]+)(\<\/i\>\))'

Result = re.search(SearchStr, htmlString)

print Result.groups()
AttributeError: 'NoneType' object has no attribute 'groups'

Since Result.groups() doesn't work, neither do the extractions I want to make (i.e. Result.group(5) and Result.group(7)). But I don't understand why I get this error? The regular expression works in TextWrangler, why not in Python? Im a beginner in Python.

try decoding your htmlString into Unicode

thkang
– thkang

2013-03-05 20:18:32 +00:00
Commented Mar 5, 2013 at 20:18 — thkang
– thkang, Commented Mar 5, 2013 at 20:18

thkang · Accepted Answer · 2013-03-05 20:20:44Z

63

You are getting AttributeError because you're calling groups on None, which hasn't any methods.

regex.search returning None means the regex couldn't find anything matching the pattern from supplied string.

when using regex, it is nice to check whether a match has been made:

Result = re.search(SearchStr, htmlString)

if Result:
    print Result.groups()

answered Mar 5, 2013 at 20:20

thkang

11.6k15 gold badges71 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jO. Over a year ago

Seems to be a problem with escaping the () in (<i>mohl behh, GRAH-syuhs</i>). I have tried both '(' and '\(' but neither seem to work.

Fumbles Over a year ago

Wow I didn't realize all I needed was if Result:... Why does that statement work? When reading it, it just feels like it's missing the rest of the if statement.

Lion Hunter Over a year ago

because it compares the object != None i think. and if that statement is true you enter the if part

Lou Over a year ago

@Fumbles - It's worth reading about "truthy and falsy" values in Python. You can do lots of cool tricks with if statements - like if you want to test whether a list is empty or not, just do if my_list:, and it will run only if the list is non-empty. Empty lists, empty strings, the integer 0, or in this case empty regex matches, all are considered "Falsy", so they fail truth tests in conditionals.

antonavy · Accepted Answer · 2013-07-12 12:08:50Z

15

import re

htmlString = '</dd><dt> Fine, thank you.&#160;</dt><dd> Molt bé, gràcies. (<i>mohl behh, GRAH-syuhs</i>)'

SearchStr = '(\<\/dd\>\<dt\>)+ ([\w+\,\.\s]+)([\&\#\d\;]+)(\<\/dt\>\<dd\>)+ ([\w\,\s\w\s\w\?\!\.]+) (\(\<i\>)([\w\s\,\-]+)(\<\/i\>\))'

Result = re.search(SearchStr.decode('utf-8'), htmlString.decode('utf-8'), re.I | re.U)

print Result.groups()

Works that way. The expression contains non-latin characters, so it usually fails. You've got to decode into Unicode and use re.U (Unicode) flag.

I'm a beginner too and I faced that issue a couple of times myself.

answered Jul 12, 2013 at 12:08

antonavy

4891 gold badge6 silver badges13 bronze badges

Collectives™ on Stack Overflow

Regex: AttributeError: 'NoneType' object has no attribute 'groups'

2 Answers 2

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Linked

Related