Write the output in another file with pandas

Question

I'm using LDA to find topics in a text.

import pandas
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

n_components = 5
n_top_words = 10


def print_top_words(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        message = "Topic %d: " % topic_idx
        message += " ".join([feature_names[i]
                         for i in topic.argsort()[:-n_top_words - 1:-1]])
        print(message)
    print()

df = pandas.read_csv('text.csv', encoding = 'utf-8')
text = df['a']
data_samples = text.values.tolist()

# Use tf (raw term count) features for LDA.
tf_vectorizer = CountVectorizer()
tf = tf_vectorizer.fit_transform(data_samples)


lda = LatentDirichletAllocation(n_components=n_components, max_iter=5,
                            learning_method='online',
                            learning_offset=50.,
                            random_state=0)
lda.fit(tf)

print("\nTopics in LDA model:")
tf_feature_names = tf_vectorizer.get_feature_names()
print_top_words(lda, tf_feature_names, n_top_words)

I have a good output:

Topics in LDA model:

Topic 0: order not produced well received advance return always wishes

Topic 1: then wood color between pay broken transfer change arrival bad

Topic 2: delivery product possible package store advance date broken very good

Topic 3: misleading product france model broken open book year research association

Topic 4: address delivery change invoice deliver missing please billing advance change

But I wish write this output in a csv file with pandas.

Topic 0   Topic 1   Topic 2   ...
order     advance   ...       ...
not       return    ...       ...
produced  always    ...       ...
well      wishes    ...       ...
received  hello     ...       ...

It's possible?

No, they're just lines. So I have doubts if I can use pandas. — marin
– marin, Commented Aug 10, 2018 at 10:10
make a list of lists and then transform it into pandas data frame. ex: [['Topic 0', 'order', 'not', 'produced', 'well', 'received'],[]...[]] and use pd.DataFrame(list).T — Ayush Kesarwani
– Ayush Kesarwani, Commented Aug 10, 2018 at 10:18

Ayush Kesarwani · Accepted Answer · 2018-08-10 10:25:34Z

1

def print_top_words(model, feature_names, n_top_words):
    out_list = []
    for topic_idx, topic in enumerate(model.components_):
        message = "Topic%d: " % topic_idx
        message += " ".join([feature_names[i]
                     for i in topic.argsort()[:-n_top_words - 1:-1]])
        out_list.append(message.split())
        print(message)
    print()
    return outlist
...
df_ = print_top_words(lda, tf_feature_names, n_top_words)
df_ = pd.DataFrame(df_).T
df_.to_csv('filename.csv')

answered Aug 10, 2018 at 10:25

Ayush Kesarwani

5306 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ben10 · Accepted Answer · 2018-08-10 10:10:41Z

1

Topics in LDA model:

Topic 0: order not produced well received advance return always wishes

Topic 1: then wood color between pay broken transfer change arrival bad

Topic 2: delivery product possible package store advance date broken very good

Topic 3: misleading product france model broken open book year research association

Topic 4: address delivery change invoice deliver missing please billing advance change

df.to_csv("filename.csv")

answered Aug 10, 2018 at 10:10

ben10

831 silver badge16 bronze badges

Collectives™ on Stack Overflow

Write the output in another file with pandas

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related