I have this code that looks inside a file and picks up 5000 words one at a time written on a new line. parsing is my variable, in this case it equals "economist". If the words in the "data" file are within parsing, then the word is appended to the output list.
The problem is why the words 'on' and 'no' are repeatedly appended? This happens with some other phrases and not necessarily with all. The words 'on' and 'no' are repeated only once in the data file.
Using set helps with the repeat but some words are repeated in the phrase so I lose them.
My code for reading the file into data:
data = [line.strip() for line in open("words.txt", 'r')]
output = []
for each in data:
if parsing != "" and each in parsing:
output.append(each)
Samples:
phrase = economist
sortedout = ['economist', 'on', 'no', 'on', 'no', 'no', 'no', 'no']
and
phrase = timesonline # with this one 'in' gets repeated and not no
sortedout = ['online', 'online', 'time', 'line', 'line', 'son', 'in', 'on', 'so', 'me', 'in', 'on', 'so', 'in']
It is a hacker rank challenge. Here is the Data File, which is suppose to be on their local drive and the Challenge.
When I do this [d for d in data if d == "on" ] it returns more than one 'on' and it should not.
data? A list with all words in the document? Seens so - and that you have the other words, in that order, in the text.output = [d for d in data if d in parsing] if parsing else []to simplify to the filtering list comprehension, and avoid all the work whenparsingis empty (so yourparsing != ""test would cause the loop to do nothing anyway). Or to avoid all the verbosity on one line:output = []thenif parsing: output.extend(d for d in data if d in parsing). By just testingparsing, notparsing != ""orparsing != [], you can switch the type ofparsingwithout needing to change the test; empty sequences are falsy, non-empty are truthy.