1

I have an xml file where I need to strip out xml tags where if possible I can use a wild card because the data within the tags will be different information. See xml below:

 <relationship relation="1">
        <sourcedid>
            <source>xxxxx</source>
            <id>AbDT-1398</id>  ***this data will be different for each grouping****
        </sourcedid>
        <label/>
    </relationship>

Basically I need to search the xml file for the grouping and have a wild card character within the tags and remove the entire grouping. Throughout my xml the tag is listed but the data is what changes.

2
  • 1
    Your question is a bit unclear, are you searching for sections to remove from the file? Commented Jul 9, 2011 at 5:12
  • 1
    What is your expected output? Have you tried BeautifulSoup? Commented Jul 9, 2011 at 5:23

1 Answer 1

2

If I got you right, you want to remove certain tags (and eventually their contents) from your xml file. Try using lxml for processing the lxml file. Have a look at these functions from lxml.etree.

Delete all elements with the provided tag names from a tree or subtree. This will remove the elements and their entire subtree, including all their attributes, text content and descendants.

This will remove the elements and their attributes, but not their text/tail content or descendants. Instead, it will merge the text content and children of the element into its parent.

Is this what you are looking for? If yes, there is nice answer on SO you should have a look at.

Sign up to request clarification or add additional context in comments.

4 Comments

I don't want to merge the content, I need to delete the content out of the xml. In my xml example I need to remove all data with the following: <relationship relation="1"> <sourcedid> <source>xxxxx</source> <id>AbDT-1398</id> ***this data will be different for each grouping**** </sourcedid> <label/> </relationship> the <id> </id> tags will have unique data so I need to remove that grouping.
@Nmayo: What is a "grouping"? Please clarify what you are asking by editing your question.
thanks for the help folks. Was able to figure out how to remove the <relationship> </relationship> tags
@Nmayo: You are welcome! If my suggestions helped you, feel free to accept the answer. If you cam up with your own idea, answer your quest below an accept it, so people with similar issues/problems can benefit from it. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.