4

I recently interviewed at Google. Because of this question my process didn't move forward after 2 rounds.

Suppose you are given an array of numbers. You can be given queries to:

  1. Find the sum of the values between indexes i and j.
  2. Update value at index i to a new given value.
  3. Find the maximum of the values between indexes i and j.
  4. Check whether the subarray between indexes i and j, both inclusive, is in ascending or descending order.

I gave him a solution but it was to check the subarray between the indexes i and j. He asked me to optimize it. I thought of using a hashtable so that if the starting index is same and the ending index is more than the previous found, we store the maximum and whether its in ascending or descending and check only the remaining subarray. But that also didn't optimize it as much as required. I'd love to know how I can optimize the solution so as to make it acceptable.

Constraints:

Everything from [1,10^5]

Thanks :)

3
  • What's the actual question? Queries 1 thru 4 are those you can perform on the array, right? So what problem do you want to solve? Commented Nov 1, 2014 at 10:05
  • I need to give the answer to these queries. As in, the program should output the answer for every query. Commented Nov 1, 2014 at 10:13
  • My guess is that the optimization you are looking for strongly depends on the distribution of query types and indexes in each type. Commented Nov 1, 2014 at 10:43

2 Answers 2

3

All this queries can be answered in O(log N) time per query in the worst case(with O(N) time for preprocessing). You can just build a segment tree and maintain the sum, the maximum and two boolean flags(they indicate whether the range which corresponds to this node is sorted in ascending/descending order or not) for each node. All this values can be recomputed efficiently for an update query because only O(log N) nodes can change(they lie on the path from the root to a leaf which corresponds to the changing element). All other range queries(sum, max, sorted or not) are decomposed into O(log N) nodes(due to the properties of a segment tree), and it is easy to combine the value of two nodes in O(1)(for example, for sum the result of combining 2 nodes is just the sum of values for these nodes).

Here is some pseudo code. It shows what data should be stored in a node and how to combine values of 2 nodes:

class Node {
    bool is_ascending
    bool is_descending
    int sum
    int max
    int leftPos
    int rightPos
}

Node merge(Node left, Node right) {
    res = Node()
    res.leftPos = left.leftPos
    res.rightPos = right.rightPos
    res.sum = left.sum + right.sum
    res.max = max(left.max, right.max)
    res.is_ascending = left.is_ascending and right.is_ascending 
        and array[left.rightPos] < array[right.leftPos]
    res.is_descending = left.is_descending and right.is_descending 
        and array[left.rightPos] > array[right.leftPos]
    return res
 }
Sign up to request clarification or add additional context in comments.

6 Comments

What does 'N' refer to here? For a segment tree, this is the number of segments. But he does not have segments in the first place. He has an array of n elements. If he wanted to cover all possible segments on this array, then N = n*n, giving complexities that are worse than a linear search. (No downvote, just asking - maybe I just misunderstood something here)
@Marco13 N is the size of an array. It is not exactly segment tree for segments. It is something like geeksforgeeks.org/segment-tree-set-1-sum-of-given-range
Then for an array with 10 elements, how many segments (leaf nodes) are there, and what are their leftPos and rightPos fields?
@Marco13 10 leaves with (0, 0), (1, 1) ... (9, 9) leftPos and rightPos fields.
I see (I posted the comment before you added the link), so this looks reasonable, +1 (but I'd still have a look at the Fenwick trees ;-))
|
-1

As andy pointed out in the comments: The queries are quite different in nature, so the "best" solution will probably depend on which query type is executed most frequently.

However, the task

  1. Find the sum of the values between indexes i and j.

can efficiently be solved by performing a scan/prefix sum computation of the array. Imagine an array of int values

index:       0   1   2   3   4   5   6
array:    [  3,  8, 10, -5,  2, 12,  7 ]

Then you compute the Prefix Sum:

index:       0   1   2   3   4   5   6,  7 
prefix:   [  0,  3, 11, 21, 16, 18, 30, 37 ]

Note that this can be computed particularly efficient in parallel. In fact, this is one important building block of many parallel algorithms, as described in the thesis "Vector Models for Data-Parallel Computing" by Guy E. Blelloch (thesis as PDF File).

Additionally, it can be computed in two ways: Either starting with the value from array[0], or starting with 0. This will, of course, affect how the resulting prefix array has to be accessed. Here, I started with 0, and made the resulting array one element longer than the input array. This may also be implemented differently, but in this case, it makes it easier to obey the array limits (although one would still have to clarify in which cases indices should be considered as inclusive or exclusive).

However, given this prefix sum array, one can compute the sum of elements between indices i and j in constant time, by simply subtracting the corresponding values of the prefix sum array:

sum(n=i..j)(array[n]) = (prefix[j+1] - prefix[i])

For example:

sum(n=2..5)(array[n])   = 10 + (-5) + 2 + 12 = 19
prefix[5+1] - prefix[2] = 30 - 11            = 19

For task 2,

  1. Update value at index i to a new given value.

this would mean that the prefix sums would have to be updated. This could be done brute-force, in linear time, by just adding the difference of the old value and the new value to all prefix sums that appear after the modified element (but for this, also see the notes in the last section of this answer)


The tasks 3 and 4

  1. Find the maximum of the values between indexes i and j.
  2. Check whether the subarray between indexes i and j, both inclusive, is in ascending or descending order.

I could imagine that the maximum value could simply be tracked while building the prefix sums, as well as checking whether the values are only ascending or descending. However, when values are updated, this information would have to be re-computed.


In any case, there are some data structures that deal with prefix sums in particular. I think that a Fenwick tree might allow to implement some of the O(n) operations mentioned above in O(logn), but I have not yet looked at this in detail.

4 Comments

Segment tree gives O(log N) time complexity per one query in the worst case without any additional assumptions about query frequency.
@user2040251 The segment tree does not focus on the sums that have to be computed (and I'm not sure how to cover this properly with a segment tree), and the updates that have to be incorporated in these sums. But in general, it will boil down to "some tree data structure", be it a segment tree or a Fenwick tree....
It seems to me that a data structure choice and its usage is actually the key point of this question.
@user2040251 Yes, and I'd have a look at the Fenwick tree there, because it explicitly covers the prefix sum computation and the updates. I don't think that the segment tree is appropriate here, because he does not have segments in the first place. I'll write a short comment about this to your answer....

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.