How to Extract Specific XML lines from an API response using Python

Question

I have a project where I connect to an API that puts out the data in XML. I can grab the values of the tags and print those out but want to grab the whole line of specific lines in the XML and write it out to an XML file. How is the best way to do this in Python? I don't really have any code to share as I'm unsure how to write this.

Here is an example of the XML file or output given by the api:

<?xml version="1.0" ?>
<Product ParentID="XXX" ID="XXX" title="XXX">   
    <Values>
        <Value AttributeID="ABC" title="ABC1">ABC</Value>
        <Value AttributeID="DEF" title="DEF1">DEF</Value>
        <Value AttributeID="GHI" title="GHI1">GHI</Value>
    </Values>
</Product>

I would want to write the xml file to read like this:

<?xml version="1.0" ?>
<Product ParentID="XXX" ID="XXX" title="XXX">   
    <Values>
        <Value AttributeID="ABC" title="ABC1">ABC</Value>
        <Value AttributeID="GHI" title="GHI1">GHI</Value>
    </Values>
</Product>

Can you edit your question and add a short xml examples and the expected output from it? — Jack Fleeting
– Jack Fleeting, Commented Jan 29, 2021 at 15:12
Hi @JackFleeting! I updated with an example. Can you provide a solution on how I would grab different lines in the xml and only print those out to an xml file? — LibertyMan
– LibertyMan, Commented Jan 29, 2021 at 16:15
You still need to explain how you decided to delete <Value AttributeID="DEF" title="DEF1">DEF</Value>: is it because it's the second? Because its text value is DEF? etc. — Jack Fleeting
– Jack Fleeting, Commented Jan 29, 2021 at 16:20
I'm only interested in keeping the lines of: <Value AttributeID="ABC" title="ABC1">ABC</Value> <Value AttributeID="GHI" title="GHI1">GHI</Value> Because the AttributeID is not ABC or GHI — LibertyMan
– LibertyMan, Commented Jan 29, 2021 at 16:22
You have to be more specific: do you want to keep them because the have ABC and GHI? What exactly is in them that makes them different from the one in the middle? Python can't guess why you are doing what you are doing.... — Jack Fleeting
– Jack Fleeting, Commented Jan 29, 2021 at 16:25

Jack Fleeting · Accepted Answer · 2021-01-29 16:41:36Z

1

You can get there using lxml with xpath:

from lxml import etree
products = """[your xml above]"""

doc = etree.XML(products)
values = doc.xpath('//Value')
for value in values:
    if value.text!="ABC" and value.text!="GHI":
    #alternatively:
    if value.text=="DEF":
        value.getparent().remove(value)
print(etree.tostring(doc).decode())

Output:

<Product ParentID="XXX" ID="XXX" title="XXX">   
    <Values>
        <Value AttributeID="ABC" title="ABC1">ABC</Value>
        <Value AttributeID="GHI" title="GHI1">GHI</Value>
    </Values>
</Product>

answered Jan 29, 2021 at 16:41

Jack Fleeting

25k6 gold badges27 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

LibertyMan Over a year ago

when I run this I get XMLSyntaxError - Start tag expected, '<' not found, line 1, column 1 (<string>, line1) error. How do I solve that?

LibertyMan Over a year ago

I'm trying to open an xml file instead of having a string. When I try and open it I get the XMLSyntaxError. How do I resolve this?

Jack Fleeting Over a year ago

@LibertyMan Use etree.parse(r'C:\path_to_file\file.xml')

LibertyMan Over a year ago

thanks! Two final questions: 1. How do I overwrite the file with the new information? 2. Would you be able to provide comments on each line of your code to explain what it is doing so I can learn from it for future reference?

Jack Fleeting Over a year ago

@LibertyMan I'll try to respond over the weekend, if time permits.

Darren Woodson · Accepted Answer · 2021-01-29 14:11:22Z

0

In my experience I've always used a parser of some kind to parse the text and save it as an object with attributes accessible like any other array or list.

Here is a link to python 3's xml element tree parser which will create an object based on your xml input that can be accessed using indices or string keys.

https://docs.python.org/3/library/xml.etree.elementtree.html

I hope this and the examples help!

answered Jan 29, 2021 at 14:11

Darren Woodson

961 gold badge1 silver badge13 bronze badges

Collectives™ on Stack Overflow

How to Extract Specific XML lines from an API response using Python

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related