2

I am trying to parse xml document (URL) using requests,

facing following error:

ValueError: Unicode strings with encoding declaration are not supported

here is my code:

import requests
from lxml import etree
from lxml.etree import fromstring

req = requests.request('GET', "http://www.nbp.pl/kursy/xml/LastC.xml")

a = req.text
b = etree.fromstring(a)

how can i get this xml parsed. Thanks in advance for help

3
  • Have you looked at this post? stackoverflow.com/questions/15830421/… Commented Apr 4, 2015 at 11:41
  • @AlexeyGorozhanov i tried it.. not working for me! throws same error Commented Apr 4, 2015 at 11:42
  • @quikrr: it cannot possibly throw the same error if you actually used req.content; are you 100% certain you used the right method there? Commented Apr 4, 2015 at 12:00

1 Answer 1

4

You are passing in the Unicode decoded version. Don't do that, XML parsers require that you pass in the raw bytes instead.

Instead of req.text, use req.content here:

a = req.content
b = etree.fromstring(a)

You can also stream the XML document to the parser:

req = requests.get("http://www.nbp.pl/kursy/xml/LastC.xml", stream=True)
req.raw.decode_content = True  # ensure transfer encoding is honoured
b = etree.parse(req.raw)
Sign up to request clarification or add additional context in comments.

Comments