Python NameError when writing to csv

Question

This BeautifulSoup Parser works as it should when printing data while looping. It outputs the correct things. The final line of code (outputting to csv) says that user2 is not defined, even though it seems to be... Any ideas? (Thanks all! It was an indentation error, now edited. Code works!)

import csv
from bs4 import BeautifulSoup

# Create output file and write headers
f = csv.writer(open('/Users/xx/Downloads/#parsed.csv', "w"), delimiter = '\t')
f.writerow(["date", "username", "tweet"]) #csv column headings
soup = BeautifulSoup(open("/Users/simonlindgren/Downloads/#raw.html")) #input html document 

tweetdata = soup.find_all("div", class_="content") #find anchors of each tweet
#print tweetdata
for tweet in tweetdata:
    username = tweet.find_all(class_="username js-action-profile-name")
    for user in username:
        user2 = user.get_text()
        #print user2
    date = tweet.find_all(class_="_timestamp js-short-timestamp ")
    for d in date:
        date2 = d.get_text()
        tweet = tweet.find_all(class_="js-tweet-text tweet-text")
        for t in tweet:
            tweet2 = t.get_text().encode('utf-8')
            tweet3 = tweet2.replace('\n', ' ')
            tweet4 = tweet3.replace('\"','')

    f.writerow([date2, user2, tweet4])

Could you please review the indentation - it's important in Python. — jonrsharpe
– jonrsharpe, Commented Feb 19, 2015 at 14:11
A copy of the input html document and expected CSV output would also be helpful. — Aaron D
– Aaron D, Commented Feb 19, 2015 at 14:38

Aaron D · Accepted Answer · 2015-02-19 14:45:24Z

The problem is user2 is only scoped inside the loop for user in username:. Once that loop ends, user2 is not accessible. Changing your code to f.writerow([username, date, tweet]) should work without the NameError, but I suspect that this code will not produce what you want. This is because those values are still going to have the HTML code in them (which is why you have used the get_text() to pull out the data from the tags).

Instead, assuming that there is only one username, date and tweet text body per tweet, you could change your code to something like this:

tweetdata = soup.find_all("div", class_="content") #find anchors of each tweet
#print tweetdata
for tweet in tweetdata:
    # pull out our data
    username = tweet.find_all(class_="username js-action-profile-name")
    date = tweet.find_all(class_="_timestamp js-short-timestamp ")
    text = tweet.find_all(class_="js-tweet-text tweet-text")

    our_data = tuple(username[0].get_text(), date[0].get_text(),
                       text[0].get_text().encode('utf-8'))
    print "User: %s - Date: %s - Text: %s" % our_data

    # write to CSV
    f.writerow(our_data)

This avoids using the unnecessary for loops (since each tweet will only have one username, date and text body anyway). If you need to write it out as a list, change our_data from being a tuple to a list.

Collectives™ on Stack Overflow

Python NameError when writing to csv

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related