This question is for learning purposes, it serves little practical uses.
TextStat is a class that has a method count_words that reads a text file and gives words frequency (word counts) dictionary sorted from most frequent word to the least one.
import pandas as pd
from collections import Counter
from itertools import chain
class TextStat:
def __init__(self, file_path):
self.file_path = file_path
def count_words(self):
with open(self.file_path, encoding='utf-8') as file:
text = [line.split() for line in file.readlines()]
text = list(chain(*text))
dict_text = dict(Counter(text))
dict_text = dict(sorted(dict_text.items(), key=lambda item: item[1], reverse=True))
return dict_text
EXAMPLE:
text_1 = TextStat('path/to/file')
dict_1 = text_1.count_words()
QUESTION
How could we improve this class by giving it another method dict_to_df that takes the output dictionary from count_words method without directly calling the count_words; so we get a pandas Dataframe if we want to.
def dict_to_df(dic):
'''Given a dictionary; give a pandas Dataframe.'''
df = pd.DataFrame(data={'WORD': dic.keys(), 'FREQ': dic.values()})
return df
I mean the user should be able to do the following without calling the text_1.count_words():
df1 = text_1.dict_to_df(dict_1)
dict_1 should be created inside the class without the user interfering in creating it.
dict_1if they didn't callcount_words. You can also just define a method that does straight from aTextStatinstance to a data frame by callingcount_wordsitself.dict_1, and another gives us pandas DataFrame... however, the pandas Dataframe cannot be created without a dictionary, but the creation of the dictionary should be done by the class, not the user.dict_to_df(self). Then it can call count_words itself to generate the dictionary. Then the user would just use:df1 = text_1.dict_to_df()