0

I want to stem the words, for which i import the porterstemmer pkg from nltk but an error occurred at run time.

The error is :

TypeError: coercing to Unicode: need string or buffer, file found

My Python code is

  import nltk;     
  from nltk.stem import PorterStemmer  
  stemmer=PorterStemmer()  
  file = open('C:/Python26/test.txt','r')  
  f=open("root.txt",'w')  
  with open(file,'r',-1) as rf:  
    lines = rf.readlines()  
    for word in lines:  
        root = stemmer.stem(word)  
        f.write(root+"\n")  
    f.close()  

yes i tried it and got an error which i couldn't understand ad the error was 1.6.2 Traceback (most recent call last): File "C:\Python26\check.py", line 10, in with open(file,'r',-1) as rf: UnicodeDecodeError: 'ascii' codec can't decode byte 0xf8 in position 6: ordinal not in range(128)

                                                                                                    My code after ur recommended change is 
import nltk;
import numpy;
import numpy as np
from StringIO import StringIO
print numpy.__version__
from nltk.stem import PorterStemmer  
stemmer=PorterStemmer()  
file = np.genfromtxt('C:/Python26/test.txt', delimiter=" ")  
f=open("root.txt",'w')  
with open(file,'r',-1) as rf:
    lines = rf.readlines()  
    for word in lines:  
        root = stemmer.stem(word)  
        f.write(root+"\n")  
    f.close()                                                                                                         and my dummy file is like this  

walking
talked
oranges
books
Src
Src
mAB

3 Answers 3

2

You have already opened the file. You're trying to pass a file object to with open.... Remove file = open('C:/... line.

P.S. You will be iterating over lines, not words.

Sign up to request clarification or add additional context in comments.

Comments

1

It seems that the problem is with the parameters passed to a function, and i'm guessing its in the line root = stemmer.stem(word)

try using the function genfromtxt instead of open():

>>> import numpy as np
>>> from StringIO import StringIO
>>> np.genfromtxt('C:/Python26/test.txt', delimiter=",") #Whatever delimiter your file has.

That should fix the problem.

7 Comments

I tried the code of "Anthon" it did not create any error but did not stem the word
@ShaheenGul did you try mine?
yes i tried it and got an error which i couldn't understand ad the error was
have you declares coding at the start of your document? put this at the start of your file and see if the error persists # -*- coding: UTF-8 -*-
What this mean?? I can't understand, anyhow i also do this but same result when i add # -- coding: UTF-8 -- – , as # is used for commenting the text, but when i remove this then error was encountered
|
1

You are opening file in line 4 and then use that as the filename for another open() in line 6. Just do:

import nltk;     
from nltk.stem import PorterStemmer  
stemmer=PorterStemmer()  
with open("root.txt",'w') as f:
    with open('C:/Python26/test.txt','r',-1) as rf:  
      lines = rf.readlines()  
      for word in lines:  
          root = stemmer.stem(word)  
          f.write(root+"\n")  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.