0

I am new to numpy and pandas. I am trying to add the words and their indexes to a dataframe. The text string can be of variable length.

text=word_tokenize('this string can be of variable length')
df2 = pd.DataFrame({'index':np.array([]),'word':np.array([])})

for i in text:
    for i, row in df2.iterrows():
            word_val = text[i]
            index_val = text.index(i)
            df2.set_value(i,'word',word_val)
           df2.set_value(i,'index',index_val)    
print df2

2 Answers 2

1

To create a DataFrame from each word of your string(can be of any length), you can directly use

df2 = pd.DataFrame(text, columns=['word'])

your nltk "word_tokenize" providing you a list of words which can be used to provide column data and by default pandas take care of index.

Sign up to request clarification or add additional context in comments.

Comments

0

Just pass the list directly into the DataFrame method:

pd.DataFrame(['i', 'am', 'a', 'fellow'], columns=['word'])
     word
0       i
1      am
2       a
3  fellow

I'm not sure you want to name a column 'index' and in this case the values will be the same as the index of the DataFrame itself. Also its not a good practice to name a column 'index' as you wont be able to access it with the df.column_name syntax and your code could be confusing to other people.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.