4

I have this XML file, called xmltest.xml:

<?xml version="1.0" encoding="GBK"?>
<productMeta>
    <bands>1,2,3,4</bands>
    <imageName>TestName.tif</imageName>  
    <browseName>TestName.jpg</browseName>
</productMeta>

And I have this Python dummy code:

import xml.etree.ElementTree as ET
xmldoc = ET.parse('xmltest.xml')

But it raises a ValueError:

ValueError: multi-byte encodings are not supported

I understand this error, it raises because the encoding declaration in the first line of the XML file. The XML file is UTF-8 encoded but always have that declaration (I'm not the creator of the XML files to be analyzed). How can I avoid such encoding declaration when parsing an XML file such the former one?

2 Answers 2

7

One thing that I tried, that worked for me is to open the xml file as a file object , then use ElementTree.fromstring() passing in the complete contents of the file.

Example -

>>> import xml.etree.ElementTree as ET
>>> ef = ET.parse('a.xml')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python34\lib\xml\etree\ElementTree.py", line 1187, in parse
    tree.parse(source, parser)
  File "C:\Python34\lib\xml\etree\ElementTree.py", line 598, in parse
    self._root = parser._parse_whole(source)
ValueError: multi-byte encodings are not supported
>>> with open('a.xml','r') as f:
...     ef = ET.fromstring(f.read())
...
>>> ef
<Element 'productMeta' at 0x028DF180>

You can also, create an XMLParser with the required encoding, and this should enable you to be able to parse strings from that encoding, Example -

import xml.etree.ElementTree as ET
xmlp = ET.XMLParser(encoding="utf-8")
f = ET.parse('a.xml',parser=xmlp)
Sign up to request clarification or add additional context in comments.

4 Comments

Make sure your encoding is correct so that the strings inside your xml file can be read. Also, is it giving the same error?
Sorry, I don't know why but it works right now. :) Can you explain a little bit why your solution works?
The solution works because instead of making the parser handle the encoding of the xml file, when we are using fromstring we are just passing in the string (and the parser is most assuming it to be in the correct encoding, etc) so it does not result in the error.
Also, there can be one more solution to this, to create an xml parser, with the encoding specified. I have updated the answer for that as well.
1
 ET.parse('a.xml', parser=ET.XMLParser(encoding='iso-8859-5'))

solved my problem when dealed with xml excel in python

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.