For loop on XML data in Python returning 4x number of elements expected

Question

Using the following code, I load in an XML file that contains email data.

from xml.etree import ElementTree

with open(xmlFile, "r") as f:
    xml = ElementTree.parse(f)

I then initialize all of my variables (subsetted here for brevity):

index = []
sender = []
subject = []
date = []

And then lastly try to loop through the emails:

for node in xml.findall(".//header"): 
  index.append(node.attrib.get('index')) 
  sender.append(node.attrib.get('from')) 
  subject.append(node.attrib.get('subject')) 
  date.append(node.attrib.get('date'))

The problem is that when I do that, I get the wrong output. Now, I can't provide the data because it's confidential, but I can give what I believe should be enough to get me looking in the right direction for what's going wrong.

In [127]: nodes = xml.findall(".//header")
In [128]: len(nodes)
Out[128]: 12018

In [129]: len(index)
Out[129]: 48072

In [130]: nodes[0].attrib.viewkeys()
Out[130]: dict_keys(['index', 'from', 'read', 'headerLink', 'messageType', 'contentLink', 'state', 'messageId', 'date', 'folder', 'folderId', 'rawLink', 'subject'])

In [130]: index[0:3]
Out[131]: 
['0',
 '(NYTimes.com News Alert) [email protected]',
 'Breaking News:  At Florida State, Football Eclipses Justice: Records Show Police Often Go Easy on Players']

In [132]: for node in xml.findall(".//header")[0:3]: print(node.attrib.get("index"))
0
1
2

Any thoughts on what I'm missing? I'm pretty new to Python, but not coding, and I can't see where I'm going wrong. Thanks in advance!

Did you run your for loop 4 times in the interactive python interpreter? Without reinitializing index and other lists ? — Anand S Kumar
– Anand S Kumar, Commented Jul 20, 2015 at 17:45
@AnandSKumar No, I didn't. If that were the problem the first three values in index would still be [0, 1, 2], but then the whole value set would repeat at position 12018. As you can see above, there are values in index that should not be there, which implies it's something other than rerunning the loop without reinitializing. — tblznbits
– tblznbits, Commented Jul 20, 2015 at 17:48
is that your exact code? Are you sure you also did not by mistake append everything to index ? — Anand S Kumar
– Anand S Kumar, Commented Jul 20, 2015 at 17:50
Are you sure you did not do index = sender = subject = date = [] , for ease? — Anand S Kumar
– Anand S Kumar, Commented Jul 20, 2015 at 17:54

Anand S Kumar · Accepted Answer · 2015-07-20 17:59:26Z

1

From comments we can see that you did -

index = sender = subject = date = []

When you do the above, it actually only creates 1 list, and all the names - index , sender , subject , date are pointing to that one list. To show that all names are pointing to same list -

>>> index = sender = subject = date = []
>>> id(index)
8237464
>>> id(sender)
8237464
>>> id(subject)
8237464
>>> id(date)
8237464

And then when you do -

for node in xml.findall(".//header"): 
  index.append(node.attrib.get('index')) 
  sender.append(node.attrib.get('from')) 
  subject.append(node.attrib.get('subject')) 
  date.append(node.attrib.get('date'))

All the 4 items are added to your single list (which is being referred to by all of the names/variables) . And that is the reason you are seeing all of the data in one list.

You should define each list separately as you gave in your Example and not using the above method -

index = []
sender = []
subject = []
date = []

answered Jul 20, 2015 at 17:59

Anand S Kumar

91.4k18 gold badges196 silver badges179 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

tblznbits Over a year ago

That fixed it. Great explanation as to what's actually happening as well. Helps me learn as well as fixed my problem. Thanks, Anand!

Collectives™ on Stack Overflow

For loop on XML data in Python returning 4x number of elements expected

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related