1

I have a project where I connect to an API that puts out the data in XML. I can grab the values of the tags and print those out but want to grab the whole line of specific lines in the XML and write it out to an XML file. How is the best way to do this in Python? I don't really have any code to share as I'm unsure how to write this.

Here is an example of the XML file or output given by the api:

<?xml version="1.0" ?>
<Product ParentID="XXX" ID="XXX" title="XXX">   
    <Values>
        <Value AttributeID="ABC" title="ABC1">ABC</Value>
        <Value AttributeID="DEF" title="DEF1">DEF</Value>
        <Value AttributeID="GHI" title="GHI1">GHI</Value>
    </Values>
</Product>

I would want to write the xml file to read like this:

<?xml version="1.0" ?>
<Product ParentID="XXX" ID="XXX" title="XXX">   
    <Values>
        <Value AttributeID="ABC" title="ABC1">ABC</Value>
        <Value AttributeID="GHI" title="GHI1">GHI</Value>
    </Values>
</Product>

6
  • 1
    Can you edit your question and add a short xml examples and the expected output from it? Commented Jan 29, 2021 at 15:12
  • Hi @JackFleeting! I updated with an example. Can you provide a solution on how I would grab different lines in the xml and only print those out to an xml file? Commented Jan 29, 2021 at 16:15
  • You still need to explain how you decided to delete <Value AttributeID="DEF" title="DEF1">DEF</Value>: is it because it's the second? Because its text value is DEF? etc. Commented Jan 29, 2021 at 16:20
  • I'm only interested in keeping the lines of: <Value AttributeID="ABC" title="ABC1">ABC</Value> <Value AttributeID="GHI" title="GHI1">GHI</Value> Because the AttributeID is not ABC or GHI Commented Jan 29, 2021 at 16:22
  • You have to be more specific: do you want to keep them because the have ABC and GHI? What exactly is in them that makes them different from the one in the middle? Python can't guess why you are doing what you are doing.... Commented Jan 29, 2021 at 16:25

2 Answers 2

1

You can get there using lxml with xpath:

from lxml import etree
products = """[your xml above]"""

doc = etree.XML(products)
values = doc.xpath('//Value')
for value in values:
    if value.text!="ABC" and value.text!="GHI":
    #alternatively:
    if value.text=="DEF":
        value.getparent().remove(value)
print(etree.tostring(doc).decode())

Output:

<Product ParentID="XXX" ID="XXX" title="XXX">   
    <Values>
        <Value AttributeID="ABC" title="ABC1">ABC</Value>
        <Value AttributeID="GHI" title="GHI1">GHI</Value>
    </Values>
</Product>
Sign up to request clarification or add additional context in comments.

5 Comments

when I run this I get XMLSyntaxError - Start tag expected, '<' not found, line 1, column 1 (<string>, line1) error. How do I solve that?
I'm trying to open an xml file instead of having a string. When I try and open it I get the XMLSyntaxError. How do I resolve this?
@LibertyMan Use etree.parse(r'C:\path_to_file\file.xml')
thanks! Two final questions: 1. How do I overwrite the file with the new information? 2. Would you be able to provide comments on each line of your code to explain what it is doing so I can learn from it for future reference?
@LibertyMan I'll try to respond over the weekend, if time permits.
0

In my experience I've always used a parser of some kind to parse the text and save it as an object with attributes accessible like any other array or list.

Here is a link to python 3's xml element tree parser which will create an object based on your xml input that can be accessed using indices or string keys.

https://docs.python.org/3/library/xml.etree.elementtree.html

I hope this and the examples help!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.