0

I wrote this code to validate my xml file via a xsd

def parseAndObjectifyXml(xmlPath, xsdPath):
    from lxml import  etree

    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    xmlContent = xmlinput.read()
    myxml = etree.parse(xmlinput) # In this line xml input is empty
    schema.assertValid(myxml)

but when I want to validate it, my xmlinput is empty but my xmlContent is not empty. what is the problem?

3
  • 2
    For future reference: If you do get an exception in Python, there'll be a traceback. It'll make it much easier for us to help you if you included that traceback (in full) in your question. Commented Jul 8, 2012 at 12:03
  • @MartijnPieters but it hadn't any traceback Commented Jul 8, 2012 at 17:54
  • Then there wasn't an exception either; your question title suggested there was. Commented Jul 8, 2012 at 20:04

2 Answers 2

2

Files in python have a "current position"; it starts at the beginning of the file (position 0), then, as you read the file, the current position pointer moves along until it reaches the end.

You'll need to put that pointer back to the beginning before the lxml parser can read the contents in full. Use the .seek() method for that:

from lxml import  etree

def parseAndObjectifyXml(xmlPath, xsdPath):
    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    xmlContent = xmlinput.read()
    xmlinput.seek(0)
    myxml = etree.parse(xmlinput)
    schema.assertValid(myxml)

You only need to do this if you need xmlContent somewhere else too; you could alternatively pass it into the .parse() method if wrapped in a StringIO object to provide the necessary file object methods:

from lxml import  etree
from cStringIO import StringIO

def parseAndObjectifyXml(xmlPath, xsdPath):
    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    xmlContent = xmlinput.read()
    myxml = etree.parse(StringIO(xmlContent))
    schema.assertValid(myxml)

If you are not using xmlContent for anything else, then you do not need the extra .read() call either, and subsequently won't have problems parsing it with lxml; just omit the call altogether, and you won't need to move the current position pointer back to the start either:

from lxml import  etree

def parseAndObjectifyXml(xmlPath, xsdPath):
    xsdFile = open(xsdPath)
    schema = etree.XMLSchema(file=xsdFile)
    xmlinput = open(xmlPath)
    myxml = etree.parse(xmlinput)
    schema.assertValid(myxml)

To learn more about .seek() (and it's counterpart, .tell()), read up on file objects in the Python tutorial.

Sign up to request clarification or add additional context in comments.

Comments

-1

You should use the XML content that you have read:

xmlContent = xmlinput.read()
myxml = etree.parse(xmlContent)

instead of:

myxml = etree.parse(xmlinput)

1 Comment

Having mis-read my mis-read, I turn out to be correct and this answer is still wrong. You cannot parse XML content in a string; lxml etree.parse will interpret it as a filename instead and this will fail.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.