Binary search of words in python list

Question

I have two lists of words to compare. First list is 2 million words, second list is 150,000 words. What I need to do is to apply binary search to see if words of the first list appear in the second. I was trying liner search:

for word in words_list:
    if word in dict_list:
       print(word, 1)
    else:
       print(word, 0)

It works good, but it is very slow. Then I tried binary search but it did not work correctly:

for word in wordlist:
    lb = 0
    ub = len(dict_list)
    mid_index = (lb + ub) // 2
    item_at_mid = dict_list[mid_index]
    if item_at_mid == word:
        print(word)
    if item_at_mid < word:
        lb = mid_index + 1
    else:
        ub = mid_index

In the end I need two list first list of words that are in dictionary and second that are not.

Are you actually printing the values, or is this for demonstation? Printing is super slow compared to putting the elements into two new lists. — Reti43
– Reti43, Commented Apr 5, 2016 at 10:03

Ofer Helman · Accepted Answer · 2016-04-05 10:12:10Z

2

You can use sets:

inter = set(words_list).intersection(dict_list)
for word in words_list:
    if word in inter:
        print(word, 1)
    else:
        print(word, 0)

edited Apr 5, 2016 at 10:12

answered Apr 5, 2016 at 10:05

Ofer Helman

8061 gold badge10 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Yan Over a year ago

I do not have dictionary I have 2 lists.

Ofer Helman Over a year ago

OK, so just use dict_list.keys() -> dict_list. I thought it's a dict because of the name. edited

Prajwal K R · Accepted Answer · 2016-04-05 10:12:21Z

1

If you want to do a Binary Search:

present = []
absent = []
for word in firstList:
    lb,ub = 0,len(secondList) - 1
    found = False
    while lb <= ub:
        mid = (lb + ub) // 2
        if secondList[mid] == word:
            found = True
            break
        elif secondList[mid] < word:
            lb = mid + 1
        else:
            ub = mid - 1

    if found:
        present.append(word)
    else:
        absent.append(word)

Your binary search code was incorrect.

answered Apr 5, 2016 at 10:12

Prajwal K R

6725 silver badges14 bronze badges

Comments

venpa · Accepted Answer · 2016-04-05 10:16:28Z

1

If you are using binary search your input should have been ordered. One other possibility is convert your words_list and dict_list to set and get the output as follows:

words common to both:

words_list.intersection(dict_list)

words not in one other:

words_list-dict_list
dict_list-words_list

edited Apr 5, 2016 at 10:16

answered Apr 5, 2016 at 10:08

venpa

4,31823 silver badges24 bronze badges

2 Comments

Reti43 Over a year ago

From the wording it is implied he can get away with only making a set out of words_list. Then using intersection() and difference() and passing dict_list as a list will achieve the same result.

venpa Over a year ago

@Reti43: I was thinking same. Later i looked at the comments of previous answer which confirms only the name contains the word dict but actually it is list.

Community · Accepted Answer · 2017-05-23 12:17:04Z

0

A solution is to prefer using set instead of list, because of the O(1) complexity for __contains__ operation, as given in this solution.

If memory is a problem, then using a bloom filter is probably a good tradeoff (no false negative).

Here is a python implementation.

To create and maintain a binary tree, consider using heapq standard module.

edited May 23, 2017 at 12:17

CommunityBot

11 silver badge

answered Apr 5, 2016 at 10:07

aluriak

5,9572 gold badges29 silver badges42 bronze badges

Collectives™ on Stack Overflow

Binary search of words in python list

4 Answers 4

2 Comments

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related