1

First time posting here, so apologies in advance if I am not following best practices. My algorithm is supposed to do the following in a sorted array with possible duplicates.

  1. Return -1 if the element does not exist in the array
  2. Return the smallest index where the element is present.

I have written a binary search algorithm for an array without duplicate. This returns a position of the element or -1. Based on blackbox testing, I know that the non-duplicate version of the binary search works. I have then recursively called that function via another function to search from 0 to position-1 to find the first incidence of the element, if any.

I am currently failing a black box test. I am getting a wrong answer error and not a time out error. I have tried most of the corner cases that I could think of and also ran a brute force test with the naive search algorithm and could not find an issue.

I am looking for some guidance on what might be wrong in the implementation rather than an alternate solution.

The format is as follow: Input:

5 #array size

3 4 7 7 8 #array elements need to be sorted

5 #search query array size

3 7 2 8 4 #query elements

Output

0 2 -1 4 1

My code is shown below:

class BinarySearch:
 
 def __init__(self,input_list,query):
        
    self.array=input_list
    self.length=len(input_list)
    self.query=query
    return
 
 def binary_search(self,low,high):
    '''
    Implementing the binary search algorithm with distinct numbers on a 
    sorted input.
    '''
    
    #trivial case
    if (self.query<self.array[low]) or (self.query>self.array[high-1]):
        
        return -1
        
    elif (low>=high-1) and self.array[low]!=self.query:
        
        return -1
    else:
        m=low+int(np.floor((high-low)/2))
        if self.array[low]==self.query:
            return low
        elif (self.array[m-1]>=self.query):
            return self.binary_search(low,m)
        elif self.array[high-1]==self.query:
            return high-1
        else:    
            return self.binary_search(m,high)
    return
        
class DuplicateBinarySearch(BinarySearch):
     
     def __init__(self,input_list,query):
       
        BinarySearch.__init__(self,input_list,query)
     
     def handle_duplicate(self,position):
        '''
        Function handles the duplicate number problem.
        Input: position where query is identified.
        Output: updated earlier position if it exists else return 
        original position.
        '''
        
        if position==-1:
            return -1
        elif position==0:
            return 0
        elif self.array[position-1]!=self.query:
            return position
        else:
            new_position=self.binary_search(0,position)
            if new_position==-1 or new_position>=position:
                return position
            else:
                return self.handle_duplicate(new_position)
     
     def naive_duplicate(self,position):
        
        old_position=position
        if position==-1:
            return -1
        else:
            while position>=0 and self.array[position]==self.query:
                position-=1
            if position==-1:
                return old_position
                
            else:
                return position+1
     


 if __name__ == '__main__':
    num_keys = int(input())
    input_keys = list(map(int, input().split()))
    assert len(input_keys) == num_keys
    

    num_queries = int(input())
    input_queries = list(map(int, input().split()))
    assert len(input_queries) == num_queries
   

    for q in input_queries:
        item=DuplicateBinarySearch(input_keys,q)
        #res=item.handle_duplicate(item.binary_search(0,item.length))
        #res=item.naive_duplicate(item.binary_search(0,item.length))
        #assert res_check==res
        print(item.handle_duplicate(item.binary_search(0,item.length)), end=' ')
        #print(item.naive_duplicate(item.binary_search(0,item.length)), end=' ')

When I run a naive duplicate algorithm, I get a time out error:

Failed case #56/57: time limit exceeded (Time used: 10.00/5.00, memory used: 42201088/536870912.)

When I run the binary search with duplicate algorithm, I get a wrong answer error on a different test case:

Failed case #24/57: Wrong answer

(Time used: 0.11/5.00, memory used: 42106880/536870912.)

The problem statement is as follows:

Problem Statement

Update:

I could make the code work by making the following change but I have not been able to create a test case to see why the code would fail in the first case.

Original binary search function that works with no duplicates but fails an unknown edge case when a handle_duplicate function calls it recursively. I changed the binary search function to the following:

 def binary_search(self,low,high):
        '''
        Implementing the binary search algorithm with distinct numbers on a sorted input.
        '''
        
        #trivial case
        if (low>=high-1) and self.array[low]!=self.query:
            return -1
        
        elif (self.query<self.array[low]) or (self.query>self.array[high-1]):
            return -1
            
        
        else:
            m=low+(high-low)//2
            if self.array[low]==self.query:
                return low
            elif (self.array[m-1]>=self.query):
                return self.binary_search(low,m)
            elif self.array[m]<=self.query:   
                return self.binary_search(m,high)
            elif self.array[high-1]==self.query:
                return high-1
            else: 
                return -1
13
  • The issue may be that the test is not completing within a time limit. Your algorithm of calling binary search again with low equal 0 is not optimal, since low had already been adjusted in the initial binary search, so it is a waste to start from 0 again. Commented Jun 15, 2022 at 16:00
  • A correctly written binary search should be able to get to the first duplicate right off the bat in the main logic. Commented Jun 15, 2022 at 16:01
  • Thanks @trincot for responding. I am currently on 16/25 in the test suite and I am getting a wrong answer and not time out error. I see your point though that the code can be made faster but my issue is currently getting a wrong answer. Commented Jun 15, 2022 at 16:03
  • @MarkRansom - That is true. I saw a solution in StackOverflow which shows that implementation you are talking about. I was trying to reuse an existing code that uses binary search without duplicate. I am still curious to understand why mine is not working. Commented Jun 15, 2022 at 16:04
  • When you ask a question about code that produces the wrong output, you should provide the information to make it reproducible for us. Commented Jun 15, 2022 at 16:23

1 Answer 1

1

Since you are going to implement binary search with recursive, i would suggest you add a variable 'result' which act as returning value and hold intermediate index which equal to target value.

Here is an example:

def binarySearchRecursive(nums, left, right, target, result):

    """
    This is your exit point. 
    If the target is not found, result will be -1 since it won't change from initial value.
    If the target is found, result will be the index of the first occurrence of the target.
    """
    if left > right:
        return result 

    # Overflow prevention
    mid = left + (right - left) // 2

    if nums[mid] == target:
        # We are not sure if this is the first occurrence of the target.
        # So we will store the index to the result now, and keep checking.
        result = mid 
        # Since we are looking for "first occurrence", we discard right half.
        return binarySearchRecursive(nums, left, mid - 1, target, result) 
    elif target < nums[mid]:
        return binarySearchRecursive(nums, left, mid - 1, target, result)
    else:
        return binarySearchRecursive(nums, mid + 1, right, target, result)

if __name__ == '__main__':

    nums = [2,4,4,4,7,7,9]
    target = 4 

    (left, right) = (0, len(nums)-1)
    result = -1 # Initial value
    index = binarySearchRecursive(nums, left, right, target, result)

    if index != -1:
        print(index)
    else:
        print('Not found') 

From your updated version, I still feel the exit point of your function is a little unintuitive.(Your "trivial case" section)

Since the only condition that your searching should stop, is that you have searched all possible section of the list. That is when the range of searching area is 0, there is no element left to be search and check. In implementation, that is when left < right, or high < low, is true.

The 'result' variable, is initialized as -1 when the function first been called from main. And won't change if there is no match find. And after each successful matching, since we can not be sure if it is the first occurrence, we will just store this index into the result. If there are more 'left matching', then the value will be update. If there is not, then the value will be eventually returned. If the target is not in the list, the return will be -1, as its original initialized value.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.