Exception when parsing a xml using lxml

Question

I wrote this code to validate my xml file via a xsd

def parseAndObjectifyXml(xmlPath, xsdPath):
    from lxml import  etree

    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    xmlContent = xmlinput.read()
    myxml = etree.parse(xmlinput) # In this line xml input is empty
    schema.assertValid(myxml)

but when I want to validate it, my xmlinput is empty but my xmlContent is not empty. what is the problem?

For future reference: If you do get an exception in Python, there'll be a traceback. It'll make it much easier for us to help you if you included that traceback (in full) in your question. — Martijn Pieters
– Martijn Pieters, Commented Jul 8, 2012 at 12:03
Then there wasn't an exception either; your question title suggested there was. — Martijn Pieters
– Martijn Pieters, Commented Jul 8, 2012 at 20:04

Martijn Pieters · Accepted Answer · 2012-07-08 13:34:11Z

Files in python have a "current position"; it starts at the beginning of the file (position 0), then, as you read the file, the current position pointer moves along until it reaches the end.

You'll need to put that pointer back to the beginning before the lxml parser can read the contents in full. Use the .seek() method for that:

from lxml import  etree

def parseAndObjectifyXml(xmlPath, xsdPath):
    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    xmlContent = xmlinput.read()
    xmlinput.seek(0)
    myxml = etree.parse(xmlinput)
    schema.assertValid(myxml)

You only need to do this if you need xmlContent somewhere else too; you could alternatively pass it into the .parse() method if wrapped in a StringIO object to provide the necessary file object methods:

from lxml import  etree
from cStringIO import StringIO

def parseAndObjectifyXml(xmlPath, xsdPath):
    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    xmlContent = xmlinput.read()
    myxml = etree.parse(StringIO(xmlContent))
    schema.assertValid(myxml)

If you are not using xmlContent for anything else, then you do not need the extra .read() call either, and subsequently won't have problems parsing it with lxml; just omit the call altogether, and you won't need to move the current position pointer back to the start either:

from lxml import  etree

def parseAndObjectifyXml(xmlPath, xsdPath):
    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    myxml = etree.parse(xmlinput)
    schema.assertValid(myxml)

To learn more about .seek() (and it's counterpart, .tell()), read up on file objects in the Python tutorial.

Martijn Pieters · Accepted Answer · 2012-07-08 12:05:49Z

-1

You should use the XML content that you have read:

xmlContent = xmlinput.read()
myxml = etree.parse(xmlContent)

instead of:

myxml = etree.parse(xmlinput)

edited Jul 8, 2012 at 12:05

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

answered Jul 8, 2012 at 11:54

Simeon Visser

123k19 gold badges192 silver badges185 bronze badges

1 Comment

Martijn Pieters Over a year ago

Having mis-read my mis-read, I turn out to be correct and this answer is still wrong. You cannot parse XML content in a string; lxml etree.parse will interpret it as a filename instead and this will fail.

Collectives™ on Stack Overflow

Exception when parsing a xml using lxml

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related