1

I'm looping through a XML document and matching usernames from a txt-file.

The txt looks like:

DPL bot
Nick Number
White whirlwind
Polisci
Flannel

And the program looks like:

    import xmltodict, json

with open('testarticles.xml', encoding='latin-1') as xml_file:
    dic_xml = xmltodict.parse(xml_file.read())
    for page in dic_xml['mediawiki']['page']:
        for rev in  page['revision']:
            for user in open("usernames.txt", "r"):
                print(user)

                if 'username' in rev['contributor'] and rev['contributor']['username'] == user:
                    print(user)
                    print(rev['timestamp'])
                    timestamp = rev['timestamp'];

                    try:
                        print(rev['comment'])
                        comment = rev['comment'];
                    except:
                        print("no comment")
                        comment = ''

                    print('\n')
                    with open("User data/" + user + ".json", "a") as outfile:
                        json.dump({"timestamp": timestamp, "comment": comment}, outfile)
                        outfile.write('\n')

The problem is that the program only goes through the if-statement for the last line in the text file. It prints all the users' names before the if-statement. All users have matching posts in the XML-file and by changing to another user at the end line, that user's data is extracted into the json file.

4
  • add an else clause and print rev['contributor'] to see what's going on when it fails? Try if 'username' in rev['contributor'] and rev['contributor']['username'] == user.strip(): Commented Apr 1, 2016 at 9:18
  • Out of curiosity, does for line in open(...): automatically assume with context and thus close the file when the loop is done? Commented Apr 1, 2016 at 9:20
  • @Torxed Good question. On one hand, I wouldn't think so since open() doesn't work like that normally but, then again, it's open() plus a for loop which is a rather "ephemeral" construct that doesn't persist beyond it's scope. Also, there's nothing to call file.close() on since open() has no handle when used in a for loop. It's all speculation though hehe. Commented Apr 1, 2016 at 9:32
  • Repeatedly rereading the user file inside the loop seems like a spectacularly inefficient design, unless you expect the file to change between iterations (and even then, there is probably a better way to do it). Just read it once into a dictionary before opening the XML file. Commented Apr 1, 2016 at 9:40

1 Answer 1

1

Maybe all lines except the last have a newline at the end...

Try this:

for user in open("usernames.txt", "r"):
    user = user.strip()
    if 'username' in rev['contributor'] and rev...

or use this construct so we don't get a headache debating whether or not your code works like a with statement or not :P

with open("usernames.txt", "r") as f:
    for line in f:
        user = line.strip()
        if 'username' in rev['contributor'] and rev...

The main thing is user = user.strip() or user = line.strip()

When in doubt, look at the binary. That goes for all encoding issues as well since encoding is just a way of transforming ones and zeros to characters according to some translation table/code page.

"\n".encode("hex") == "0a" # True
# so if
user.encode("hex") 
# has "0a" at the end, there is definitely a newline after "user"
Sign up to request clarification or add additional context in comments.

2 Comments

Perfect, works now! But what about the "newline" at the end?
Cool :) What about the newline? Do you need it later on? In that case, don't redefine user or line by doing user = user.strip() or line = line.strip() but make a separate variable that's only used in the comparison - or simply add the newline back in when you write to a file again (which you're already doing here: outfile.write('\n')). Alternatively, add all usernames to a dictionary instead and only write and dump the whole thing as json once - after the loop (I’d probably do that instead of appending to the file once per user/line)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.