I'm sure I'm not the first one to run into this problem. But after hours of debugging, Googling and StackOverflow-ing without finding an answer, I decided to post this question. So sorry in advance if I missed anything, but by now, I'm pretty confused.
I'm using BeautifulSoup to parse a UTF-8 website. I'm using text from the website to build a URL to further crawl to. I'm running into some problems with non-English characters.
For example: the site contains the string Originální formule and I want to use it to build the URL: http://blahblah.com/Originální-formule or http://blahblah.com/origin%C3%A1ln%C3%AD-formule. The problem is, I'm getting http://blahblah.com/Origin\xe1ln\xed-formule, which produces an error. I tried to encode, decode and what-not, yet I still can't get the proper URL.
BTW, when I print u'Origin\xe1ln\xed-formule', the string prints just fine. It just encoding that doesn't succeed.
What am I doing wrong?