3

I'm using Python34. I want to get frequency of words from CSV file but it show an error. Here is my code.Anyone help me to solve this problem.

from textblob import TextBlob as tb
import math

words={}
def tfidf(word, blob, bloblist):
    return tf(word, blob) * idf(word, bloblist)

def tf(word, blob):
    return blob.words.count(word) / len(blob.words)

def n_containing(word, bloblist):
    return sum(1 for blob in bloblist if word in blob)

def idf(word, bloblist):
    return math.log(len(bloblist) / (1 + n_containing(words, bloblist)))

bloblist = open('afterstopwords.csv', 'r').read()

for i, blob in enumerate(bloblist):
     print("Top words in document {}".format(i + 1))
     scores = {word: tfidf(word, blob, bloblist) for word in blob.words}
     sorted_words = sorted(scores.items(), key=lambda x: x[1], reverse=True)
     for word, score in sorted_words[:3]:
         print("\tWord: {}, TF-IDF: {}".format(word, round(score, 5)))

And the error is:

 Top words in document 1
 Traceback (most recent call last):
 File "D:\Python34\tfidf.py", line 45, in <module>
    scores = {word: tfidf(word, blob, bloblist) for word in blob.words}
 AttributeError: 'str' object has no attribute 'words'
5
  • the error message is pretty clear: blob is a string, a string does not have a words attribute => you can't do blob.words Commented May 14, 2015 at 6:08
  • should i remove blob.words??? Commented May 14, 2015 at 13:48
  • I don't know ... there seems to be many problems with that code. Why are you importing TextBlob since you don't use it anywhere? did you mean to use it but forgot? Commented May 15, 2015 at 6:55
  • If i remove the textblob , the error is same . Commented May 15, 2015 at 8:56
  • i know the error is the same. I was just telling you that i have no clue what you are trying to do and therefore cannot tell you how to do it Commented May 15, 2015 at 8:58

1 Answer 1

4

from http://stevenloria.com/finding-important-words-in-a-document-using-tf-idf/ some of the code for bloblist is:

bloblist = [document1, document2, document3]

don't change it. Plus, preceding it are code for the documents like:

document1 = tb("""blablabla""")

Here's what I did...I use a function for opening files in my python, where openfile holds the file details.

txt =openfile()  
document1=tb(txt)  
bloblist = [document1] 

THe rest of the original code is unchanged. This works BUT I have only been able to get it to finish small files. It takes much too long for larger files. And it doesn't look accurate at all. For word count I use https://rmtheis.wordpress.com/2012/09/26/count-word-frequency-with-python/
and it has worked very quickly for 9999 rows each being 50-75 characters long. Seems accurate too, results seem equivalent to wordcloud results.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.