I have two lists of words to compare. First list is 2 million words, second list is 150,000 words. What I need to do is to apply binary search to see if words of the first list appear in the second. I was trying liner search:
for word in words_list:
if word in dict_list:
print(word, 1)
else:
print(word, 0)
It works good, but it is very slow. Then I tried binary search but it did not work correctly:
for word in wordlist:
lb = 0
ub = len(dict_list)
mid_index = (lb + ub) // 2
item_at_mid = dict_list[mid_index]
if item_at_mid == word:
print(word)
if item_at_mid < word:
lb = mid_index + 1
else:
ub = mid_index
In the end I need two list first list of words that are in dictionary and second that are not.
cmp file1 file2