0

This question is for learning purposes, it serves little practical uses.

TextStat is a class that has a method count_words that reads a text file and gives words frequency (word counts) dictionary sorted from most frequent word to the least one.

import pandas as pd
from collections import Counter
from itertools import chain

class TextStat:
    def __init__(self, file_path):
        self.file_path = file_path
    
    def count_words(self):
        with open(self.file_path, encoding='utf-8') as file:
            text = [line.split() for line in file.readlines()]
            text = list(chain(*text))
            dict_text = dict(Counter(text))
            dict_text = dict(sorted(dict_text.items(), key=lambda item: item[1], reverse=True))
        return dict_text

EXAMPLE:

text_1 = TextStat('path/to/file')
dict_1 = text_1.count_words()

QUESTION

How could we improve this class by giving it another method dict_to_df that takes the output dictionary from count_words method without directly calling the count_words; so we get a pandas Dataframe if we want to.

def dict_to_df(dic):
    '''Given a dictionary; give a pandas Dataframe.'''
    df = pd.DataFrame(data={'WORD': dic.keys(), 'FREQ': dic.values()})
    return df

I mean the user should be able to do the following without calling the text_1.count_words():

df1 = text_1.dict_to_df(dict_1)

dict_1 should be created inside the class without the user interfering in creating it.

7
  • Not clear what you are asking. Where did they get dict_1 if they didn't call count_words. You can also just define a method that does straight from a TextStat instance to a data frame by calling count_words itself. Commented Feb 1, 2021 at 16:24
  • @chpner The class should have two methods; one gives us a dictionary dict_1, and another gives us pandas DataFrame... however, the pandas Dataframe cannot be created without a dictionary, but the creation of the dictionary should be done by the class, not the user. Commented Feb 1, 2021 at 16:27
  • 1
    I'm not clear either. Why wouldn't you just make dict_to_df another method within class TextStat with signature dict_to_df(self). Then it can call count_words itself to generate the dictionary. Then the user would just use: df1 = text_1.dict_to_df() Commented Feb 1, 2021 at 16:32
  • @DarrylG, your answer seems legit, I didn't think about it, being a beginner in python and oop! Commented Feb 1, 2021 at 16:33
  • @AkbarHussein--glad It was useful. Do you see how to make the method dict_to_df? Commented Feb 1, 2021 at 16:34

1 Answer 1

1

Code

Reusing OP code as much as possible

import pandas as pd
from collections import Counter
from itertools import chain

class TextStat:
    def __init__(self, file_path):
        self.file_path = file_path
    
    def count_words(self):
        with open(self.file_path, encoding='utf-8') as file:
            text = [line.split() for line in file.readlines()]
            text = list(chain(*text))
            dict_text = dict(Counter(text))
            dict_text = dict(sorted(dict_text.items(), key=lambda item: item[1], reverse=True))
        return dict_text
    
    def dict_to_df(self):
        dic = self.count_words()  # self refers to object instance and we 
                                  # gain access to its methods
        df = pd.DataFrame(data={'WORD': dic.keys(), 'FREQ': dic.values()}) # using your method
        return df

Usage

text_1 = TextStat('path/to/file')
dict_1 = text_1.count_words()            # As dictionary
df1 = text_1.dict_to_df()                # As DataFrame
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.