Perform a mergesort and create a binary tree where each layer is a stage of the mergesort as a collection of sorted arrays.
eg, suppose our range we want to make queries over is:
[3, 2, 5, 5, 9, 6, 10, 1]
Then our tree would look like:
[1, 2, 3, 5, 5, 6, 9, 10]
/ \
[2, 3, 5, 5] [1, 6, 9, 10]
/ \ / \
[2, 3] [5, 5] [6, 9] [1, 10]
/ \ / \ / \ / \
[3] [2] [5] [5] [6] [9][10] [1]
This requires O(NlogN) space to store and takes O(NlogN) time to generate (just mergesort the elements and store the result of each stage of the sort).
Additionally, we want to store the endpoints of the segments represented by each internal node (this can also be computed on the fly by looking at the length of each array, but it's easier to explain if we just imagine that we store these values in the tree together with the arrays).
(1,8)
/ \
(1,4) (5,8)
/ \ / \
(1,2) (3,4) (5,6) (7,8)
/ \ / \ / \ / \
(1,1) (2,2) (3,3) (4,4) (5,5) (6,6) (7,7) (8,8)
Now we can ask for any two ranges:
- Does s cover t (x_s <= x_t and y_s >= y_t) b)
- Does s intersect t (x_s <= y_t and y_s >= x_t)
Now, suppose we're given the bounds b=(3,7) and we want to find a minimal set of nodes which cover this range.
In other words, we want to obtain: (3,4) (5,6) (7,7)
Note this list of bounds will always have O(logN) length since for any number, P, there can be at most two items which have length P and all the lengths are powers of 2.
To find this set in O(logN) time, we start from the root node and traverse the tree at each node asking:
- Does b overlap this range? If yes, include the range in the list and
stop.
- Does b intersect this range? If yes, recurse to each of its
children.
- Otherwise, stop and don't add anything to the list.
Once we've obtained the arrays corresponding to these ranges (in the 3,7 example, the corresponding arrays are [5,5],[6,9],[10]) we can now answer the following question in O(log^2N) time by binary searching each one (there are O(logN) of these sorted arrays and they each have length O(N) in the worst case) and then summing up the indices (quick note; this is the only case I've ever seen where it makes sense to add indices into different arrays together to get a sum):
Given some x, how many elements among all the arrays are less
than x?
We binary search each array for x, and the resulting index is the number of elements less than x.
Now, to find the median of some range, (s, t), we are asking:
What is the element prior to the smallest element x with the property
that (t-s+1)/2 elements in between indices s and t are less than x?
To answer this, we can binary search the original sorted list. Since each lookup in our binary search requires O(log^2N) time, the total time to answer the query is O(log^3N)
Here's the pseudocode:
fn create_tree(elems, lb, ub):
if len(elems) == 1:
return leaf((lb, ub), elems)
mid = midpt(lb, ub)
(left, right) = splitInHalf(elems)
tleft = create_tree(left, lb, midpt-1)
tright = create_tree(right, midpt, ub)
return internal(merge(tleft.elems, tright.elems), lb, ub, tleft, tright)
fn get_cover_arrays(tree,bnds):
if covers(bnds,tree.bnds):
return [tree.elems]
else if intersects(bnds,tree.bnds):
return get_cover_arrays(tree.left,bnds) `concat` get_cover_arrays(tree.right,bnds)
else:
return []
fn numElemsLT(cover_arrays, x):
return sum(idx=binary_search(arr,x) for arr in cover_arrays)
fn getKthInRange(tree, bnds, k):
cover_arrays = get_cover_arrays(tree, bnds)
all_elems = tree.elems
next_idx = binary_search(lambda i: numElemsLT(cover_arrays, all_elems[i]), k)
return all_elems[next_idx - 1]
fn getMedian(tree, (lb,ub)):
return getKthInRange(tree, (lb,ub), (ub-lb+1) / 2)