creating a dataframe from a variable length text string

Question

I am new to numpy and pandas. I am trying to add the words and their indexes to a dataframe. The text string can be of variable length.

text=word_tokenize('this string can be of variable length')
df2 = pd.DataFrame({'index':np.array([]),'word':np.array([])})

for i in text:
    for i, row in df2.iterrows():
            word_val = text[i]
            index_val = text.index(i)
            df2.set_value(i,'word',word_val)
           df2.set_value(i,'index',index_val)    
print df2

John · Accepted Answer · 2017-04-27 17:25:49Z

1

To create a DataFrame from each word of your string(can be of any length), you can directly use

df2 = pd.DataFrame(text, columns=['word'])

your nltk "word_tokenize" providing you a list of words which can be used to provide column data and by default pandas take care of index.

answered Apr 27, 2017 at 17:25

John

1,2922 gold badges16 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Grr · Accepted Answer · 2017-04-27 17:02:51Z

0

Just pass the list directly into the DataFrame method:

pd.DataFrame(['i', 'am', 'a', 'fellow'], columns=['word'])
     word
0       i
1      am
2       a
3  fellow

I'm not sure you want to name a column 'index' and in this case the values will be the same as the index of the DataFrame itself. Also its not a good practice to name a column 'index' as you wont be able to access it with the df.column_name syntax and your code could be confusing to other people.

edited Apr 27, 2017 at 17:02

answered Apr 27, 2017 at 16:57

Grr

16.2k7 gold badges72 silver badges91 bronze badges

Collectives™ on Stack Overflow

creating a dataframe from a variable length text string

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related