1

How do I find list of duplicates from list of strings? clean_up function is given

def clean_up(s):
""" (str) -> str

Return a new string based on s in which all letters have been
converted to lowercase and punctuation characters have been stripped 
from both ends. Inner punctuation is left untouched. 

>>> clean_up('Happy Birthday!!!')
'happy birthday'
>>> clean_up("-> It's on your left-hand side.")
" it's on your left-hand side"
"""

punctuation = """!"',;:.-?)([]<>*#\n\t\r"""
result = s.lower().strip(punctuation)
return result

Here is my duplicate function.

def duplicate(text):
""" (list of str) -> list of str

>>> text = ['James Fennimore Cooper\n', 'Peter, Paul, and Mary\n',
'James Gosling\n']
>>> duplicate(text)
['james']
"""

cleaned = ''
non_duplicate = []
unique = []
for word in text:
    cleaned += clean_up(word).replace(",", " ") + " "
    words = cleaned.split()        
    for word in words:
         if word in unique:

I am stuck in here.. I can't use dictionary or any other technique that keeps a count of the frequency of each word in the text. Please help..

1 Answer 1

1

You have a problem here:

cleaned += clean_up(word).replace(",", " ") + " "

This line adds the new "word" to a growing string of all words so far. Therefore each time through the for loop, you recheck all words you have seen so far.

Instead, you need to do:

for phrase in text:
    for word in phrase.split(" "):
        word = clean_up(word)

This means you only process each word once. You may then need to add it to one of your lists, depending on whether it's already in either of them. I suggest you call your lists seen and duplicates, to make it clearer what is going on.

Sign up to request clarification or add additional context in comments.

1 Comment

Good points. To be fair to the OP though - clean_up only removes leading/trailing commas (and other puncutation)... It seems that a re.findall('\w+', text) would be a more appropriate tokenizer, but that's up to the OP. The OP may also wish to consider using sets if possible, rather than lists.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.