1

sample XML File

<ArticleSet>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. [email protected]</Affiliation>
        <Keywords>-</Keywords>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>-</Affiliation>
        <Keywords>-</Keywords>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. [email protected]</Affiliation>
        <Keywords>-</Keywords>
    </Article>
</ArticleSet>

SAMPLE CODE

from xml.etree import ElementTree as etree
import re

root = etree.parse("sampleinput.xml").getroot()

for article in root.iter("Affiliation"):
    if(article.text != "-"):
        email = re.search(r'[\w\.-]+@[\w\.-]+', article.text)
        c = etree.Element("<Email>")
        c.text = email.group(0)
        etree.write(article,c)

OUTPUT REQUIRED UPDATED XML FILE

<?xml version="1.0"?>
<ArticleSet>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. [email protected]</Affiliation>
        <Keywords>-</Keywords>
        <Email>[email protected]</Email>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>-</Affiliation>
        <Keywords>-</Keywords>
        <Email>-</Email>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. [email protected]</Affiliation>
        <Keywords>-</Keywords>
        <Email>[email protected]</Email>
    </Article>
</ArticleSet>

I want to extract email address from <Affiliation> tag and make a new tag named <Email> and store extracted email into that tag. If <Affiliation> is equal to - then store <Email>-</Email> into that article.

ERROR

Traceback (most recent call last): File "C:/Users/Ghost Rider/Documents/Python/addingTagsToXML.py", line 11, in etree.write(article,c) AttributeError: module 'xml.etree.ElementTree' has no attribute 'write'

3 Answers 3

1

You can try this :

import re
import xml
tree = xml.etree.ElementTree.parse('filename.xml')
e = tree.getroot()

for article in e.findall('Article'):
    child = xml.etree.ElementTree.Element("Email")
    if article[2].text != '-':
        email = re.search(r'[\w\.-]+@[\w\.-]+', article[2].text).group()
        child.text = email
    else:
        child.text = ' - '
    article.insert(4,child)
tree.write("filename.xml")
Sign up to request clarification or add additional context in comments.

Comments

0

If you want to use the write you should correct the etree import like this:

from xml.etree.ElementTree import ElementTree

And you shouldn't use etree as an alias for ElementTree because it will overwtrite the etree python builtin module!

Furthermore I think you misinterpret the meaning of the write function, because it can only write the result tree to a file. If you want to modify an elemtree you should use something like append, extend etc. on your Element.

Comments

0

You can use lxml instance xml library.This code is working fine

import re
from lxml import etree as et
# Open original file
tree = et.parse('t.xml')
for article in tree.iter("Affiliation"):
    if(article.text != "-"):
        email = re.search(r'[\w\.-]+@[\w\.-]+', article.text)
        child = et.SubElement(article.getparent(), 'Email')
        child.text = email.group(0)
    else:
        child = et.SubElement(article.getparent(), 'Email')
        child.text = ' - '

# Write back to file
tree.write('t.xml')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.