I'm using scrapy for web crawling using python. While scraping I have some characters which are not encoded correctly like '\xa0','\x0259'. Any help how can I handle them in python?
-
have you looked at stackoverflow.com/questions/10735836/… ?paul trmbrth– paul trmbrth2013-07-18 08:43:22 +00:00Commented Jul 18, 2013 at 8:43
-
you need to provide more context - do you have some code? Scrapy provides an API that allows you to deal with unicode, but here you show us some characters and I have no idea where they came from, what the correct character encoding is, or what you want to do with them.Shane Evans– Shane Evans2013-07-18 11:41:26 +00:00Commented Jul 18, 2013 at 11:41
Add a comment
|
1 Answer
You can use the unicode string type (http://docs.python.org/2/tutorial/introduction.html#unicode-strings) by prepending all instances of characters like these with u. For example u'\xa0' and u'\x0259'. The unicode-strings python docs also provide some other methods for encoding and decoding these strings and characters.