0

I have a nested list of objects called "words". It consists of objects of a class that has data like conf(float), end(float), start(float), word(string) I want to remove duplicate occuring objects which has same "word"

class Word:
    ''' A class representing a word from the JSON format for vosk speech recognition API '''

    def __init__(self, dict):
        '''
        Parameters:
          dict (dict) dictionary from JSON, containing:
            conf (float): degree of confidence, from 0 to 1
            end (float): end time of the pronouncing the word, in seconds
            start (float): start time of the pronouncing the word, in seconds
            word (str): recognized word
        '''

        self.conf = dict["conf"]
        self.end = dict["end"]
        self.start = dict["start"]
        self.word = dict["word"]

    def to_string(self):
        ''' Returns a string describing this instance '''
        return "{:20} from {:.2f} sec to {:.2f} sec, confidence is {:.2f}%".format(
            self.word, self.start, self.end, self.conf*100)


    def compare(self, other):
        if self.word == other.word:
            return True
        else:
            return False

I tried this but couldn't get it working

 nr_words = []
 c = custom_Word.Word({'conf': 0.0, 'end': 0.00, 'start': 0.00, 'word': ''}) 
 nr_words.append(c)

 for w in words:
     for nr in nr_words:
         if w.compare(nr_words[nr]):
             print("same")
         else:
             print("not same")
             nr_words.append(w.word)
             nr_words.append(w.start)
             nr_words.append(w.end)

here is the collection of objects enter image description here

each object contain data like this

{'conf': 0.0, 'end': 0.00, 'start': 0.00, 'word': 'hello'} 

{'conf': 0.0, 'end': 1.00, 'start': 0.00, 'word': 'hello'} 

{'conf': 0.0, 'end': 2.00, 'start': 0.00, 'word': 'to'} 

my compare function from the class "Word" works perfectly

words[0].compare(words[1])
True

I also tried this way

for i in range(0,len(words)):
    for o in range(0,len(nr_words)):
        if words[i].compare(nr_words[o]):
            print("same")
        else:
            print("not same")
            nr_words.append(w.word)
            nr_words.append(w.start)
            nr_words.append(w.end)

but got error "AttributeError: 'str' object has no attribute 'word'"

I am not sure whats wrong in attribute word can some good soul guide me on how to remove the duplicate objects by "word" Thanks in advance!

1
  • 2
    Be careful with dict as a parameter, as list or str, it shadows a python keyword and is not recommended. Commented Mar 2, 2022 at 12:05

2 Answers 2

1

Answer to:

exactly opposite list of what you have done now we have kept only unique word list now but repeating word list will contain words that are repeatedly seen

frequency = {}

for w in words:
    if frequency.get(w.word, False):
        frequency[w.word].append(w)
    else:
        frequency[w.word] = [w]

repeated_words_list = []
for key in frequency:
    if len(frequency[key]) > 1:
        repeated_words_list.extend(frequency[key])

# 'repeated_words_list' is now a list containing
# all the Word objects whose `word` attribute
# appears 2 times or more.
Sign up to request clarification or add additional context in comments.

5 Comments

thanks for your precious help! I tried running this but got TypeError: dict.get() takes no keyword arguments on this line "frequency[w] = frequency.get(w, default=0) + 1"
I change this line to frequency[w] = frequency.get(w.word, 1) + 1 now it runs fine but produced incorrect result few words which are not repeated at all and are present in the output
Yeah, sorry. I edited my answer (changed to frequency.get(w, 0) + 1)
but it still producing wrong output with (w, 0) + 1) it does not produce any output
Yeah ok I see one problem. The frequency dictionnary takes the Word objects as keys, but those are unique so I should use the word attribute of those Word objects for the keys. Answer edited and now correct (tested)
1

( First answer:
Looking at this code, I guess nr_words is a list.
Could you specify what nr_words represents ? Is it like the list of the 'already seen' words ?

I also see that you print out nr.word so I suppose that nr_words is a list of Word objects.

But, the 2nd for loop is iterating over all the values of the nr_words list (Word objects), not its indexes.
So when you compare the two Word object on line 4, I think you should simply be using nr as the other argument for your compare() method, instead of nr_words[nr].
)

EDIT:
Reply to your comment

nr_words is an kind of empty list so that when I can compare it to dictionaries and append not repeating words in nr_words. Also I tried as you said to pass nr But got error AttributeError: 'str' object has no attribute 'word'

The error is because when the two words are not the same, you append w.word, w.start and w.end to the nr_words list (which are strings, floats and floats respectively) Try by appending only the Word object like so :

Corrected Code:

filtered_list = []

for w in words:
    already_seen = False
    for seen in filtered_list:
        # print(seen.word)
        if w.compare(seen):
            already_seen = True
    if not already_seen:
        filtered_list.append(w)

# now filtered_list is the list
# of all your words without the duplicates
# (based on the `word` attribute)

5 Comments

nr_words is an kind of empty list so that when I can compare it to dictionaries and append not repeating words in nr_words. Also I tried as you said to pass nr But got error AttributeError: 'str' object has no attribute 'word'
wow amazing it worked flawlessly thanks a lot, one more fever please how can I get a list only repeating words with its other attribute?
You're welcome :) What do you mean with 'repeating words with its other attribute' ?
exactly opposite list of what you have done now we have kept only unique word list now but repeating word list will contain words that are repeatedly seen
Other attributes such as start, end, conf

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.