Python regex - multiple search

Question

Here is what I'm trying to accomplish:

Using python mechanize I open a site
If content does not match my regex I open another site
I perform searching using another regex

And the extracted code:

m = re.search('<td>(?P<alt>\d+)', response.read())
...
m = re.search('<td>(?P<alt>\w+)', response.read())
print m.group('alt')

I'm getting:

AttributeError: 'NoneType' object has no attribute 'group'

If I uncomment the second search everything is fine. I don't understand this behaviour.

Such an error redirected me to this stackoverflow issue and to this - but to no avail - neither of these solved my problem.

I don't care about efficiency here so I don't use compile.

What is the unfiltered result of each response.read()? I'm betting the second read isn't returning what you expect. — cmaynard
– cmaynard, Commented Feb 7, 2011 at 17:38
Could you add some more details about what you are trying to do by calling re.search twice? The current example code makes no sense. — shang
– shang, Commented Feb 7, 2011 at 17:45
@kramthegram - thanks! You're right. It wasn't regex issue. @shang - because response.read() changes beetween these 2 lines - vide second point of my question. — laszchamachla
– laszchamachla, Commented Feb 7, 2011 at 17:48

Reiner Gerecke · Accepted Answer · 2011-02-07 17:50:24Z

2

Assuming response is a file-like object, calling read a second time might return a empty string as you consumed the file before.

data = response.read()
m = re.search('<td>(?P<alt>\d\d*)', data)
m = re.search('<td>(?P<alt>\d\d*)', data)
print m.group('alt')

Why would you call search multiple times?

edited Feb 7, 2011 at 17:50

answered Feb 7, 2011 at 17:38

Reiner Gerecke

12.3k2 gold badges51 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

laszchamachla Over a year ago

You're right - thanks! So it wasn't regex issue. My mistake. I would like call search multiple times, because data might change between these two lines (second point of my question).

Reiner Gerecke Over a year ago

@laszchamachla In that case, I don't see how this is any help. If I understand you correctly, you're getting page A, search on its data, in case of no matches, you do a new request and search on that data. There shouldn't be a problem if between two searches, you issue a new request and get a new response.

laszchamachla Over a year ago

@Reiner - exactly, it is pretty strange to me too. But, as you adviced, asigning response.read() to variable before every search solves the problem.

9000 Over a year ago

Also I'd suggest to compile the regex once: rx = re.compile('<td>(?P<alt>\d\d*)') and then re-use it wherever needed: m = rx.search(data).

laszchamachla Over a year ago

@9000 - I wrote: "I don't care about efficiency here so I don't use compile." - it is not the point in this case, but thanks for your suggestion.

|

Collectives™ on Stack Overflow

Python regex - multiple search

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related