43

Trying to do the following...

from lxml import etree
from lxml.etree import fromstring

if request.POST:
    parser = etree.XMLParser(ns_clean=True, recover=True)
    h = fromstring(request.POST['xml'], parser=parser)
    return HttpResponse(h.cssselect('itagg_delivery_receipt status').text_content())

but it give this error:

[Fri Apr 05 10:27:54 2013] [error] Internal Server Error: /sms/status_postback/
[Fri Apr 05 10:27:54 2013] [error] Traceback (most recent call last):
[Fri Apr 05 10:27:54 2013] [error]   File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 115, in get_response
[Fri Apr 05 10:27:54 2013] [error]     response = callback(request, *callback_args, **callback_kwargs)
[Fri Apr 05 10:27:54 2013] [error]   File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 77, in wrapped_view
[Fri Apr 05 10:27:54 2013] [error]     return view_func(*args, **kwargs)
[Fri Apr 05 10:27:54 2013] [error]   File "/srv/project/livewireSMS/sms/views.py", line 42, in update_delivery_status
[Fri Apr 05 10:27:54 2013] [error]     h = fromstring(request.POST['xml'], parser=parser)
[Fri Apr 05 10:27:54 2013] [error]   File "lxml.etree.pyx", line 2754, in lxml.etree.fromstring (src/lxml/lxml.etree.c:54631)
[Fri Apr 05 10:27:54 2013] [error]   File "parser.pxi", line 1569, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82659)
[Fri Apr 05 10:27:54 2013] [error] ValueError: Unicode strings with encoding declaration are not supported.

this is the XML

 <?xml version="1.1" encoding="ISO-8859-1"?>
<itagg_delivery_receipt>
<version>1.0</version>
<msisdn>447889000000</msisdn>
<submission_ref>
845tgrgsehg394g3hdfhhh56445y7ts6</
submission_ref>
<status>Delivered</status>
<reason>4</reason>
<timestamp>20050709120945</timestamp>
<retry>0</retry>
</itagg_delivery_receipt> 

I don't have control over the xml document this comes from the SMS company.

2

3 Answers 3

45

You'll have to encode it and then force the same encoding in the parser:

from lxml import etree
from lxml.etree import fromstring

if request.POST:
    xml = request.POST['xml'].encode('utf-8')
    parser = etree.XMLParser(ns_clean=True, recover=True, encoding='utf-8')
    h = fromstring(xml, parser=parser)

    return HttpResponse(h.cssselect('delivery_reciept status').text_content())
Sign up to request clarification or add additional context in comments.

2 Comments

Can a standalone example be provided where the XML data is defined in the code?
this solution doesn't help with encoding problems.
24

The following solution from kernc worked for me:

from lxml import etree

xml = u'<?xml version="1.0" encoding="utf-8" ?><foo><bar/></foo>'
xml = bytes(bytearray(xml, encoding='utf-8'))  # ADDENDUM OF THIS LINE (when unicode means utf-8, e.g. on Linux)
etree.XML(xml)

# <Element html at 0x5b44c90>

1 Comment

Interesting. Even after reading the page suggesting the workaround, I sill miss why decoding fails... :S
24

More simple than answers above:

from lxml import etree

#Do request for data, response = r#
data = etree.fromstring(bytes(r.text, encoding='utf-8'))

Apologies all as at this time I was young, dumb and not quite mature enough to take the effort to explain my answer (nor did I have the knowledge really) 🙏

  • fromstring() is a custom constructor for the etree object as part of the lxml library
  • XML in the form of a string may contain characters which are not encoded how this constructor would like them to be -> thus you can encode them into utf-8 bytes and this will align with the fromstring() constructor requirements

3 Comments

You might add some explanations to your solution, so that later users can follow your code more easily.
What type is r ?
The variable r would be from the request library. So you would have r = requests.get(URL).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.