0

I am trying to get a binary search to work in Python. I have a massive, sorted list of passwords. The plan is to get a password input from the user and see if it is in the list. I've decided to implement a binary search because of the size of the list.

Here's my code:

Found = False
Password = user_input("Enter a password: ")


with io.open('final.txt', encoding='latin-1') as myfile:

    data = myfile.readlines()
    low = 0
    high = (int(len(data))+1)
    while (low < high) and not Found:

        mid = int((low+high)/2)

        if data[mid] == Password:
            Found = True
            break
        elif Password < str(data[mid]):
            high = mid - 1
        elif Password > str(data[mid]):
            low = mid + 1

I am guessing it is because of the string comparison? Any ideas? The binary search never returns true, even if I explicitly search something that I know is in the list.

I used this code to sort the password list.

import io

with io.open('result.txt', encoding='latin-1') as myfile:
    data = myfile.readlines()

def partition(data, start, end):
    pivot = data[end]                          # Partition around the last value
    bottom = start-1                           # Start outside the area to be partitioned
    top = end                                  # Ditto

    done = 0
    while not done:                            # Until all elements are partitioned...

        while not done:                        # Until we find an out of place element...
            bottom = bottom+1                  # ... move the bottom up.

            if bottom == top:                  # If we hit the top...
                done = 1                       # ... we are done.
                break

            if data[bottom] > pivot:           # Is the bottom out of place?
                data[top] = data[bottom]       # Then put it at the top...
                break                          # ... and start searching from the top.

        while not done:                        # Until we find an out of place element...
            top = top-1                        # ... move the top down.

            if top == bottom:                  # If we hit the bottom...
                done = 1                       # ... we are done.
                break

            if data[top] < pivot:              # Is the top out of place?
                data[bottom] = data[top]       # Then put it at the bottom...
                break                          # ...and start searching from the bottom.

    data[top] = pivot                          # Put the pivot in its place.
    return top                                 # Return the split point


def quicksort(data, start, end):
    if start < end:                            # If there are two or more elements...
        split = partition(data, start, end)    # ... partition the sublist...
        quicksort(data, start, split-1)
        quicksort(data, split+1, end)


quicksort(data, 0, (int(len(data))-1))

with io.open('final.txt', 'w', encoding='latin-1') as f:
    for s in data:
        f.write(s)

The sorted list looks something like this: whitespace, then symbols, then numbers, then capital letters (alphabetically sorted), then common letters (alphabetically sorted).

3
  • What is the problem ? Commented Feb 5, 2016 at 13:04
  • The binary search never returns true, even if I explicitly search something that I know is in the list. After any search, printing high or low always returns 992352. Commented Feb 5, 2016 at 13:06
  • beyond your algorithmic problem, two caveats : 1) the execution time is 99% reading the file: so the linear search is the best approach here. 2) If you store passwords in memory, set is better than list : with passwords=set(data), Password in passwords solve your problem in 0(1), when your approach is O( ln(n)) . Commented Feb 5, 2016 at 13:58

6 Answers 6

3

Do not write your own binary search, it's a bit tricky to get them right. Use bisect module instead.

from bisect import bisect_left
def binary_search(lst, el):
   # returns lower bound of key `el` in list `lst`
   index = bisect_left(lst, el)
   # check that: (1) the lower bound is not at the end of the list and
   # (2) the element at the index matches `el`
   return index < len(lst) and lst[index] == el

Usage:

test = ["abc", "def", "ghi"]
print(binary_search(test, "def")) # True
print(binary_search(test, "xyz")) # False
Sign up to request clarification or add additional context in comments.

1 Comment

I am well aware that there's ways to do it without re-inventing the wheel. However, I'm doing it as an exercise in algorithmical thinking and was hoping you could help me debug my code.
0

You probably have a new line character at the end of each password after calling readlines, use rstrip() to remove it

    Found = False
    Password = user_input("Enter a password: ")


    with io.open('final.txt', encoding='latin-1') as myfile:

        data = myfile.readlines()
        low = 0
        high = len(data)-1   #no need to cast to int, should be len()-1
        while (low <= high) and not Found:  #less than or equal to

            mid = int((low+high)/2)

            if data[mid].rstrip() == Password:   #Remove newline character before compare
                Found = True
                break
            elif Password < str(data[mid]):
                high = mid - 1
            elif Password > str(data[mid]):
                low = mid + 1

Comments

0

If you only want to search the password in your list then In your code

data = myfile.readlines()

you have already taken all the passwords into the memory. so if you just want to check if a given password is present in your list or not, you can directly check by using

if Password in data:
     print "yes it is present in the list"
else:
    print "Not present in the list"

hope it may help.

Comments

0

This is example of binary search

def binarySearch(alist, item):
        first = 0
        last = len(alist)-1
        found = False

        while first<=last and not found:
            midpoint = (first + last)//2
            if alist[midpoint] == item:
                found = True
            else:
                if item < alist[midpoint]:
                    last = midpoint-1
                else:
                    first = midpoint+1

        return found

mylist1 = [0, 1, 2, 8, 9, 17, 19, 32, 42,]
print(binarySearch(mylist1, 3))
print(binarySearch(mylist1, 13))

mylist2 = [0, 1, 2, 8, 9, 17, 19, 32, 42, 99]
print(binarySearch(mylist2, 2))
print(binarySearch(mylist2, 42))

I got then

False
False
True
True

Yes and I am sure that you need new line character at the end of each password after calling readlines,as Eamon pointed out.

Comments

0

There are two problems .

  1. Your binary search algorithm is wrong .

The repeat condition should be

while (low <= high)

or your can't find the first and the last element .

  1. readlines() will read \n but user_input() does not .

Which causes `Password` == `Password\n' be false forever.

Comments

0

You're skipping parts of your list because of the way you're setting low and high. Because of this, low == high occurs after updating and before checking, causing you to jump out of the loop prematurely.

There are two easy solutions:

Either..

  • set high = mid or low = mid instead of mid -/+ 1, triggering an extra iteration,

or..

  • Check if high == low and data[low] == Password after the loop terminates, as you might still find Password there.

1 Comment

Ah, as @KIDJourney mentioned, changing your loop condition also fixes it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.