Using binary search to find the duplicate number in an array

Question

The problem:

Given an array of integers nums containing n + 1 integers where each integer is in the range [1, n] inclusive.

There is only one repeated number in nums, return this repeated number.

You must solve the problem without modifying the array nums and uses only constant extra space.

Here is one of the possible solution using binary search

class Solution(object):
    def findDuplicate(self, nums):
        beg, end = 1, len(nums)-1
        
        while beg + 1 <= end:
            mid, count = (beg + end)//2, 0
            for num in nums:
                if num <= mid: count += 1        
            if count <= mid:
                beg = mid + 1
            else:
                end = mid
        return end

Example 1:

Input: nums = [1,3,4,2,2]
Output: 2

Example 2:

Input: nums = [3,1,3,4,2]
Output: 3

Can someone please explain this solution for me? I understand the code but I don't understand the logic behind this. In particular, I do not understand how to construct the if statements (lines 7 - 13). Why and how do you know that when num <= mid then I need to do count += 1 and so on. Many thanks.

No, I got it, it doesn't require array to be sorted. On each iteration we count number of elements that are less than mid and that's why it has O(n log n) complexity. Actually this is not the most optimal solution, you can get O(n) just by xor-ing all the numbers (and rounding up n to the nearest 2^x). — fas
– fas, Commented Oct 24, 2022 at 6:40
for num in nums 100% defeats the purpose of binary search... — Joran Beasley
– Joran Beasley, Commented Oct 24, 2022 at 6:54

Grismar · Accepted Answer · 2022-10-24 07:14:42Z

The solution keeps halving the range of numbers the answer can still be in.

For example, if the function starts with nums == [1, 3, 4, 2, 2], then the duplicate number must be between 1 and 4 inclusive by definition.

By counting how many of the numbers are smaller than or equal to the middle of that range (2), you can decide if the duplicate must be in the upper or lower half of that range. Since the actual number is greater (3 numbers are lesser than or equal to 2, and 3 > 2), the number must be in the lower half.

Repeating the process, knowing that the number must be between 1 and 2 inclusive, only 1 number is less than or equal to the middle of that range (1), which means the number must be in the upper half, and is 2.

Consider a slightly longer series: [1, 2, 5, 6, 3, 4, 3, 7]. Between 1 and 7 lies 3, 4 numbers are less than or equal to 3, so the number must be between 1 and 3. Between 1 and 3 lies 2, 2 numbers are less than or equal to 2, so the number must be over 2, which leaves 3.

The solution will iterate over all n elements of nums a limited number of times, since it keeps halving the search space. It's certainly more efficient than the naive:

    def findDuplicate(self, nums):
        for i, n in enumerate(nums):
            for j, m in enumerate(nums):
                if i != j and n == m:
                    return n

But as user @fas suggests in the comments, this is better:

    def findDuplicate(nums):
        p = 1
        while p < len(nums):
            p <<= 1
        r = 0
        for n in nums:
            r ^= n
        for n in range(len(nums), p):
            r ^= n
        return r

fas · Accepted Answer · 2022-10-24 07:21:17Z

This is how given binary search works. Inside binary search you have implementation of isDuplicateLessOrEqualTo(x):

mid, count = (beg + end)//2, 0
for num in nums:
  if num <= mid: count += 1        
if count <= mid:
  return False  # In this case there are no duplicates less or equal than mid. 
                # Actually count == mid would be enough, count < mid is impossible.
else:
  return True   # In this case there is a duplicate less or equal than mid.

isDuplicateLessOrEqualTo(x) is a non-decreasing function (assume x has a duplicate, then for all i < x it's false and for all i >= x it's true), that's why you can run binary search over it.

Each iteration you run through the array, which gives you overall complexity O(n log n) (where n is size of array).

There's a faster solution. Note that xor(0..(2^n)-1) = 0 for n >= 2, because there are 2^(n-1) ones for each bit position and it's an even number, for example:

0_10 = 00_2
1_10 = 01_2
2_10 = 10_2
3_10 = 11_2
       ^
     2 ones here, 2 is even
        ^
      2 ones here, 2 is even

So by xor-ing all the numbers you'll receive exactly your duplicate number. Let's write it:

class Solution(object):
  def nearestPowerOfTwo(number):
    lowerBoundDegreeOfTwo = number.bit_length()
    lowerBoundDegreeOfTwo = max(lowerBoundDegreeOfTwo, 2)
    return 2 ** lowerBoundDegreeOfTwo

  def findDuplicate(self, nums):
    xorSum = 0
    for i in nums:
      xorSum = xorSum ^ i
    for i in range(len(nums), nearestPowerOfTwo(len(nums) - 1)):
      xorSum = xorSum ^ i
    return xorSum

As you can see that gives us O(n) complexity.

Gal Fleissig · Accepted Answer · 2022-10-27 19:29:18Z

0

If anyone is interested in a different approach (not binary search) to solve this problem:

Sum all elements of the array - we will call it sumArray - the time complexity is O(n).
Sum all numbers from 1 to n (inclusive) - we will call it sumGeneral - this is simply (n * (n+1) / 2) - the time complexity is O(1).
Return the result of sumArray - sumGeneral

In total, the time complexity is O(n) (you cannot do better since you have to look at all elements of the array, potentially the repeated one is at the end), and additional space complexity is O(1).

(If you could use O(n) additional space complexity you could use a hash table)

answered Oct 27, 2022 at 19:29

Gal Fleissig

331 silver badge6 bronze badges

1 Comment

fas Over a year ago

"Sum all numbers from 1 to n ... the time complexity is O(1)." Depends on how you count. You perform calculations on large numbers, e.g. one digit number plus one digit number definitely works faster, that 100k digit number plus 100k digit number.

Collectives™ on Stack Overflow

Using binary search to find the duplicate number in an array

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related