4

Yet another interview question asked me to find the maximum possible subarray of repeated values given a sorted array in shortest computational time possible.

Let input array be A[1 ... n]
Find an array B of consecutive integers in A such that:
for x in range(len(B)-1):
     B[x] == B[x+1]

I believe that the best algorithm is dividing the array in half and going from the middle outwards and comparing from the middle the integers with one another and finding the longest strain of the same integers from the middle. Then I would call the method recursively by dividing the array in half and calling the method on the two halves.

My interviewer said my algorithm is good but my analysis that the algorithm is O(logn) is incorrect but never got around to telling me what the correct answer is. My first question is what is the Big-O analysis of this algorithm? (Show as much work as possible please! Big-O is not my forte.) And my second question is purely for my curiosity whether there is an even more time efficient algorithm?

4
  • I'm quite confused by your question. Could you describe in more detail what the interviewer meant? (What's “strain”?) And could you also describe your solution in more detail? (Possibly using pseudocode.) Commented Sep 15, 2012 at 13:54
  • Undated with more detail. I used divide and conquer basically. Commented Sep 15, 2012 at 14:00
  • Please revise your title so it will be more useful to future users of this site. Commented Sep 15, 2012 at 14:00
  • Yes exactly, thank you. My memory of the exact wording is fuzzy. Commented Sep 15, 2012 at 14:01

4 Answers 4

4

The best you can do for this problem is an O(n) solution, so your algorithm cannot possibly be both correct and O(lg n).

Consider for example, the case where the array contains no repeated elements. To determine this, one needs to examine every element, and examining every element is O(n).

This is a simple algorithm that will find the longest subsequence of a repeated element:

start = end = 0
maxLength = 0
i = 0
while i + maxLength < a.length:
    if a[i] == a[i + maxLength]:
        while i + maxLength < a.length and a[i] == a[i + maxLength]:
            maxLength += 1
        start = i
        end = i + maxLength
    i += maxLength

return a[start:end]

If you have reason to believe the subsequence will be long, you can set the initial value of maxLength to some heuristically selected value to speed things along, and then only look for shorter sequences if you don't find one (i.e. you end up with end == 0 after the first pass.)

Sign up to request clarification or add additional context in comments.

4 Comments

There should be a tighter bound than O(n). The OP's algo sounds much more efficient than scanning the array from the first to last element
We are talking about worst-case complexity here. In the worst case (i.e. every element is unique) you must examine every element = O(n).
You forgot to say what is the time complexity of your solution.
@svick It's O(n). In the worst case maxLength is 1 and i += maxLength just becomes i += 1.
0

I think we all agree that in the worst case scenario, where all of A is unique or where all of A is the same, you have to examine every element in the array to either determine there are no duplicates or determine all the array contains one number. Like the other posters have said, that's going to be O(N). I'm not sure divide & conquer helps you much with algorithmic complexity on this one, though you may be able to simplify the code a bit by using recursion. Divide & conquer really helps cut down on Big O when you can throw away large portions of the input (e.g. Binary Search), but in the case where you potentially have to examine all the input, it's not going to be much different.

I'm assuming the result here is you're just returning the size of the largest B you've found, though you could easily modify this to return B instead.

So on the algorithm front, given that A is sorted, I'm not sure there's going to be any answer faster/simpler answer than just walking through the array in order. It seems like the simplest answer is to have 2 pointers, one starting at index 0 and one starting at index 1. Compare them and then increment them both; each time they're the same you tick a counter upward to give you the current size of B and when they differ you reset that counter to zero. You also keep around a variable for the max size of a B you've found so far and update it every time you find a bigger B.

1 Comment

Agree that the worst case, "all unique", costs O(N). But "all same" could be decided immediately from A[1]==A[n]; that's O(1), and I'd call this the best case.
0

In this algorithm, n elements are visited with a constant number of calculations per each visited element, so the running time is O(n).

Given sorted array A[1..n]:

max_start = max_end = 1
max_length = 1
start = end = 1
while start < n
    while A[start] == A[end] && end < n
        end++
    if end - start > max_length
        max_start = start
        max_end = end - 1
        max_length = end - start
    start = end 

2 Comments

You are better off starting from end = start + max_length rather than end = start + 1. It's still O(n) but it is faster most of the time.
You are right. The point in this particular algorithm is simplicity so that it will be easier to see how for every element in the array, there are constant extra operations.
-1

Assuming that the longest consecutive integers is only of length 1, you'll be scanning through the entire array A of n items. Thus, the complexity is not in terms of n, but in terms of len(B).

Not sure if the complexity is O(n/len(B)).

Checking the 2 edge case

- When n == len(B), you get instant result (only checking A[0] and A[n-1] - When n == 1, you get O(n), checking all elements - When normal case, I'm too lazy to write the algo to analyze...

Edit

Given that len(B) is not known in advance, we must take the worst case, i.e. O(n)

4 Comments

This answer is not correct, the way complexity class is calculated is with respect to the number of input elements, and refers to the worst case running time of an algorithm by default, i.e. unless otherwise stated. If it wasn't, I could say that the running time of my (brute force) algorithm that cracks your standards compliant AES encryption is O(1) because it could get very very lucky and test the right key first...
Downvoted for attempting to define a tighter bound? :( The OP's algo is definitely better than a linear search and definitely have a dependency on len(B)
@lol if given that the AES encryption is flawed and tend to have reused keys in a consecutive manner, and you wrote cracker that exploits that pattern, is your algorithm still O(n)? Or would it depend on how many consecutive reused keys there are?
"if given that..." is assuming a premise, when you write that you have to be careful not to materially change the question. Mathematicians prefix these sorts of statements with "without loss of generality" to clarify that they know what they're doing when they assume premises. What I was saying is that AES cracking is never O(1) or O(n) for that matter... What would n even be...? (please don't answer that)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.