0

Here is what I'm trying to accomplish:

  1. Using python mechanize I open a site
  2. If content does not match my regex I open another site
  3. I perform searching using another regex

And the extracted code:

m = re.search('<td>(?P<alt>\d+)', response.read())
...
m = re.search('<td>(?P<alt>\w+)', response.read())
print m.group('alt')

I'm getting:

AttributeError: 'NoneType' object has no attribute 'group'

If I uncomment the second search everything is fine. I don't understand this behaviour.

Such an error redirected me to this stackoverflow issue and to this - but to no avail - neither of these solved my problem.

I don't care about efficiency here so I don't use compile.

3
  • What is the unfiltered result of each response.read()? I'm betting the second read isn't returning what you expect. Commented Feb 7, 2011 at 17:38
  • Could you add some more details about what you are trying to do by calling re.search twice? The current example code makes no sense. Commented Feb 7, 2011 at 17:45
  • @kramthegram - thanks! You're right. It wasn't regex issue. @shang - because response.read() changes beetween these 2 lines - vide second point of my question. Commented Feb 7, 2011 at 17:48

1 Answer 1

2

Assuming response is a file-like object, calling read a second time might return a empty string as you consumed the file before.

data = response.read()
m = re.search('<td>(?P<alt>\d\d*)', data)
m = re.search('<td>(?P<alt>\d\d*)', data)
print m.group('alt')

Why would you call search multiple times?

Sign up to request clarification or add additional context in comments.

7 Comments

You're right - thanks! So it wasn't regex issue. My mistake. I would like call search multiple times, because data might change between these two lines (second point of my question).
@laszchamachla In that case, I don't see how this is any help. If I understand you correctly, you're getting page A, search on its data, in case of no matches, you do a new request and search on that data. There shouldn't be a problem if between two searches, you issue a new request and get a new response.
@Reiner - exactly, it is pretty strange to me too. But, as you adviced, asigning response.read() to variable before every search solves the problem.
Also I'd suggest to compile the regex once: rx = re.compile('<td>(?P<alt>\d\d*)') and then re-use it wherever needed: m = rx.search(data).
@9000 - I wrote: "I don't care about efficiency here so I don't use compile." - it is not the point in this case, but thanks for your suggestion.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.