1

I'm using LDA to find topics in a text.

import pandas
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

n_components = 5
n_top_words = 10


def print_top_words(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        message = "Topic %d: " % topic_idx
        message += " ".join([feature_names[i]
                         for i in topic.argsort()[:-n_top_words - 1:-1]])
        print(message)
    print()

df = pandas.read_csv('text.csv', encoding = 'utf-8')
text = df['a']
data_samples = text.values.tolist()

# Use tf (raw term count) features for LDA.
tf_vectorizer = CountVectorizer()
tf = tf_vectorizer.fit_transform(data_samples)


lda = LatentDirichletAllocation(n_components=n_components, max_iter=5,
                            learning_method='online',
                            learning_offset=50.,
                            random_state=0)
lda.fit(tf)

print("\nTopics in LDA model:")
tf_feature_names = tf_vectorizer.get_feature_names()
print_top_words(lda, tf_feature_names, n_top_words)

I have a good output:

Topics in LDA model:

Topic 0: order not produced well received advance return always wishes

Topic 1: then wood color between pay broken transfer change arrival bad

Topic 2: delivery product possible package store advance date broken very good

Topic 3: misleading product france model broken open book year research association

Topic 4: address delivery change invoice deliver missing please billing advance change

But I wish write this output in a csv file with pandas.

Topic 0   Topic 1   Topic 2   ...
order     advance   ...       ...
not       return    ...       ...
produced  always    ...       ...
well      wishes    ...       ...
received  hello     ...       ...

It's possible?

6
  • 1
    use df.to_csv('file.csv') Commented Aug 10, 2018 at 10:07
  • 1
    how is this output generated? is it dataFrame?\ Commented Aug 10, 2018 at 10:09
  • No, they're just lines. So I have doubts if I can use pandas. Commented Aug 10, 2018 at 10:10
  • 1
    can you store this output as string? Commented Aug 10, 2018 at 10:14
  • 1
    make a list of lists and then transform it into pandas data frame. ex: [['Topic 0', 'order', 'not', 'produced', 'well', 'received'],[]...[]] and use pd.DataFrame(list).T Commented Aug 10, 2018 at 10:18

2 Answers 2

1
def print_top_words(model, feature_names, n_top_words):
    out_list = []
    for topic_idx, topic in enumerate(model.components_):
        message = "Topic%d: " % topic_idx
        message += " ".join([feature_names[i]
                     for i in topic.argsort()[:-n_top_words - 1:-1]])
        out_list.append(message.split())
        print(message)
    print()
    return outlist
...
df_ = print_top_words(lda, tf_feature_names, n_top_words)
df_ = pd.DataFrame(df_).T
df_.to_csv('filename.csv')
Sign up to request clarification or add additional context in comments.

Comments

1

Topics in LDA model:

Topic 0: order not produced well received advance return always wishes

Topic 1: then wood color between pay broken transfer change arrival bad

Topic 2: delivery product possible package store advance date broken very good

Topic 3: misleading product france model broken open book year research association

Topic 4: address delivery change invoice deliver missing please billing advance change

df.to_csv("filename.csv")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.