0

I am building a little website crawler and I've encountered some problems with it. The first one would be Unicode characters in the url

Let's say I have the following url : http://putlocker.is/actor/Juan_Fern%C3%A1ndez

My code is :

        try:

             connection = urllib.urlopen(self.__link)
             get = connection.read().decode('utf8')


        except:
            if UnicodeDecodeError:
                 print("UnicodeDecodeError !!!")  

I'm talkink about the original link , not about the encoded one

6
  • 1
    Can't reproduce it with given URL. I've tried urllib, urllib2 and requests - no UnicodeError. Commented Jan 21, 2015 at 17:14
  • The error comes from the ORIGINAL link with the original characters . Please enter the website and copy the link from there Commented Jan 21, 2015 at 17:18
  • As I understand you need to decode content of the page the link is referring to. What I did is followed the linked and copied it from address bar. Did I do something wrong? Commented Jan 21, 2015 at 17:31
  • 1
    If you want people to use the original link, use the original link in your question; you're just inviting confusion. And if you're getting an error, pasting the full error into the question is better than making us guess. Commented Jan 21, 2015 at 17:57
  • The problem is that if I copy the original link it becomes automatically putlocker.is/actor/Juan_Fern%C3%A1ndez this link !! Commented Jan 21, 2015 at 18:32

1 Answer 1

1

Your way of error handling seems to be wrong. Expression under your if-statement UnicodeDecodeError will always be True. You probably should change it to

try:
    ...
except UnicodeDecodeError:
    #handle error

In your case any error is just swallowed so you don't even see what the actual error is.

Sign up to request clarification or add additional context in comments.

3 Comments

It wouldn't explain why his except-clause gets executed though. Then again, OP didn't show his full code.
Yes, it's probably some other error cause there is no problem decoding content of that link to utf-8
Yep sorry , let's say self.__link is the link I have provided in the original form

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.