2

I have Python script to parse XML files into a more friendly format for another platform.

Every so often one of the data files contains no data - only the encoding info and no other tags, which is causing ElementTree to throw a ParseError when it finds them.

<?xml version="1.0" encoding="utf-8"?>

Is there a way of testing for the empty file before calling ElementTree?

Ta.

2
  • maybe count the < character in data? if you find just 1 then file is probably empty. Commented Jul 31, 2017 at 15:58
  • Err, how about endswith("?>") ? Commented Jul 31, 2017 at 16:04

3 Answers 3

2

You should ask for forgiveness not permission here.

Handle the exception by wrapping the code in a try/except block.

import xml.etree.ElementTree as ET
...
try:    
   tree = ET.parse(fooxml)
except ET.ParseError:
   # log error
   pass
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you - this looks perfect. I have other coding experience but my Python knowledge is somewhat less than 8 hours old and I have to do this in a hurry ;)
0

Of course have several ways, use:

try:
    pass # delete this and add your parse code
except:
    pass # write your exception when empty

or use if statement:

if (some code to evalue if xml is not empty):
    # your code
elif (some code to check if .xml is empty):
    # your code

let me know how it was!

3 Comments

Thank you - try/catch was exactly what I was looking for.
please if my answer was helpful in some way, vote for it! :)
I did - apparently I don't have enough points for it to count yet, sorry ;(
0

Of course you could catch the exception that lxml throws. If you want to avoid parsing, you could check if the file contains only one < symbol:

with open("input.xml","rb") as f:
   contents = f.read()
   if contents.count(b"<")<=1:
      # empty or only header: skip
      pass
   else:
      x = etree.XML(contents)

of course this heuristic method doesn't protect from other parsing errors. So it's best to just protect the parsing by a try/except block.

But this method has the advantage of being extremely fast if you have lots of corrupt 1-line "header only" file.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.