Optimize Binary Search in sorted array find number of occurences

Question

Im trying to do the smallest number of operations possible to find the number of occurences of an element in the array. Save even 1 operation if possible. So far this is the best binary search version I know. I cant use vectors or any other std:: functions

int modifiedbinsearch_low(int* arr, int low, int high , int key){   
    if(low==high) return high ; 

    int mid = low + (high-low) /2;

    if(key >  arr[mid] ) { modifiedbinsearch_low(arr,mid + 1 , high,key);  } 
    else  { modifiedbinsearch_low(arr,low,mid,key);  }  
}

int modifiedbinsearch_high(int* arr, int low, int high , int key){   
    if(low==high) return high ; 

    int mid = low + (high-low) /2;

    if(key <  arr[mid] ) { modifiedbinsearch_high(arr,low,mid,key);  } 
    else  { modifiedbinsearch_high(arr,mid+1,high,key);  } 

} 

int low = modifiedbinsearch_low( ...)
int high = modifiedbinsearch_high( ...)

This version collapsed both functions into just one but it takes almost double the time. Im wondering the idea is good for it to become the fastest but the implementation is wrong.

#include<stdio.h>
int binarysearch(int a[],int n,int k,bool searchfirst){
    int result=-1;
    int low=0,high=n-1;
    while(low<=high){
        int mid=(low+high)/2;
        if(a[mid]==k)  {
              result=mid; 
           if(searchfirst)
              high=mid-1; 
            else
              low=mid+1;
    }
    else if(k<a[mid])  high=mid-1;
    else low=mid+1;
    }
    return result;
}

int main(){
    int a[]={1,1,1,2,2,3,3,3,6,6,6,6,6,7,7};
    int n=sizeof(a)/sizeof(a[0]);
    int x=6;
    int firstindex=binarysearch(a,n,x,true);
    printf("%d\n",firstindex);
    if(firstindex==-1){
        printf("elment not found in the array:\n ");
    }
    else {
        int lastindex=binarysearch(a,n,x,false);
        printf("%d\n",lastindex);
        printf("count is = %d", lastindex-firstindex+1);
    }

}

Shorter version

      int  binmin(int a[], int start, int end,int val ) {
         if(start<end) {
            int mid = (start+end)/2;
            if(a[mid]>=val) 
                binmin(a,start,mid-1,val);
            else if(a[mid]<val)
                binmin(a,mid+1,end,val);

      }
      else if(start>end)
           return start;
}

After the search has found firstindex the upper bound can’t be below a+firstindex+1, so the second search should search the array pointed at by a+firstinde+1. And, depending on what you know about the data, a linear search for the upper bound could be faster. — Pete Becker
– Pete Becker, Commented May 30, 2022 at 2:48
So far this is the best binary search version I know. -- Did you try std::binary_search. In other words, can you beat the professional library writers? Then you have std::lower_bound and std::upper_bound. — PaulMcKenzie
– PaulMcKenzie, Commented May 30, 2022 at 2:49
@its a search in the millions of elements in the array but the number of elements found is in the 2000. — Olivia22
– Olivia22, Commented May 30, 2022 at 2:49
@PaulMcKenzie — std::binary_search finds an element with the given value. It doesn’t find the full range. — Pete Becker
– Pete Becker, Commented May 30, 2022 at 2:51
If this is asking for help to optimize the code, I think it might not be a great fit for this site, and isn't very likely to be helpful others who aren't willing spend the time to understand this code and then understand the answers. Maybe this would fit better on codereview.stackexchange.com, but I'm not sure. check the guide. — starball
– starball ♦, Commented Nov 25, 2022 at 7:51

selbie · Accepted Answer · 2022-05-30 22:08:25Z

1

Here's a performance issue. In the main while loop, you aren't breaking out fo the loop when you find the target value.

while(low<=high){
    int mid=(low+high)/2;
    if(a[mid]==k)  {
          result=mid;  // you need to break out of the loop here
       if(searchfirst)
          high=mid-1; 
        else
          low=mid+1;
}

But that's not the only issue. The searchfirst value shouldn't be what dictates adjusting high or low as you are doing it. In a classic binary search, adjust the high or low parameter based on how a[mid] compares to k`. Probably closer to this:

while(low<=high) {
    int mid=(low+high)/2;
    if(a[mid]==k)  {
          result=mid;  // you need to break out of the loop here
          break;
    }
    if(a[mid] > k)
        high=mid-1; 
    else
       low=mid+1;
}

You have the right idea. Binary search to find the element. But let me suggest the simpler solution after the initial binary search is to just "scan to the left" and "scan to the right" to count duplicate elements.

Let me know what you think of this:

int binarySearch(int* arr, int start, int end, int value) {
    while (start <= end) {
        int mid = (start + end) / 2;
        if (arr[mid] == value) {
            return mid;
        }
        start = (arr[mid] < value) ? (mid + 1) : start;
        end = (arr[mid] > value) ? (mid - 1) : end;
    }
    return -1;
}

int countSide(int* arr, int length, int index, int value, int step) {
    int count = 0;
    while (index >= 0 && index <= (length - 1) && arr[index] == value) {
        count++;
        index += step;
    }
    return count;
}

int main() {
    int a[] = { 1,1,1,2,2,3,3,3,6,6,6,6,6,7,7 };
    int n = sizeof(a) / sizeof(a[0]);
    int x = 6;
    int firstindex = binarySearch(a, 0, n - 1, x);
    printf("%d\n", firstindex);
    if (firstindex == -1) {
        printf("elment not found in the array:\n ");
    }
    else {
        int count = countSide(a, n, firstindex, x, -1);
        count += countSide(a, n, firstindex, x, 1);
        count--; // because we counted the middle element twice
        printf("count is = %d\n", count);
    }
}

Updated

Here's a solution that does two binary searches to find the lower and upper bounds of the target value in the array and simply measures the distance between the indices to get the count:

int bound(int* arr, int length, int value, bool isUpperBound) {

    int best = -1;
    int start = 0;
    int end = start + length - 1;

    while (start <= end) {
        int mid = (start + end) / 2;
        if (arr[mid] == value) {
            best = mid;

            if (isUpperBound) {
                start = mid + 1;
            }
            else {
                end = mid - 1;
            }
        }
        else if (arr[mid] < value) {
            start = mid + 1;
        }
        else if (arr[mid] > value) {
            end = mid - 1;
        }
    }
    return best;
}


int main() {
    int a[] = { 1,1,1,2,2,3,3,3,6,6,6,6,6,7,7 };
    int n = sizeof(a) / sizeof(a[0]);
    int x = 6;
    int firstindex = bound(a, n, x, false);
    int lastindex = bound(a, n, x, true);
    printf("%d - %d\n", firstindex, lastindex);
    if (firstindex == -1) {
        printf("elment not found in the array:\n ");
    }
    else {
        int count = lastindex-firstindex + 1;
        printf("count is = %d\n", count);
    }
}

edited May 30, 2022 at 22:08

answered May 30, 2022 at 6:23

selbie

105k15 gold badges109 silver badges187 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Goswin von Brederlow Over a year ago

start + end risks UB. use std::midpoint.

selbie Over a year ago

@GoswinvonBrederlow - OP says, "I cant use vectors or any other std:: functions". And by risking UB, you mean the possibility of start+end overflowing? That would have to be a VERY large array. Is there another edge case to consider?

Goswin von Brederlow Over a year ago

If using std::midpoint is truly forbidden then one can easily copy the code for it. And yes, I mean start + end overflowing. A test case of 1 billion items isn't that big. And if you compile for e.g. arduino it will be anything above 16384.

Olivia22 Over a year ago

@selbie Thanks for the feedback. However finding the upper and lower bounds its still faster than iterating left and right. Specially if the occurences are in the thousands.

selbie Over a year ago

@Olivia22 - I can see how upper/lower bounds traversal - which is similar to searching for X-1 and X+1 and computing the diff between indicies could be faster for some range of inputs. However, that is largely dependent on the size of the array, the number of iterations to find the target value, and the actual count of repeating elements.

|

Collectives™ on Stack Overflow

Optimize Binary Search in sorted array find number of occurences

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related