5

Suppose you have an array A of n items, and you want to find the k items in A closest to the median of A. For example, if A contains the 9 values {7, 14, 10, 12, 2, 11, 29, 3, 4} and k = 5, then the answer would be the values {7, 14, 10, 12, 11}, since the median is 10 and these are the five values in A closest to the value 10. Give an algorithm to solve this problem in O(n) time.

I know that a selection algorithm (deep selection) is the appropriate algorithm for this problem, but I think that would run in O(n*logn) time instead of O(n). Any help would be greatly appreciated :)

3
  • IMO you will have to sort the list, and that will always be bigger than O(n). Commented Nov 4, 2010 at 9:13
  • Your problem is equivalent to being able to find an arbitrary percentile in O(n) time. Finding just the median in O(n) time (i.e. solving your problem for k = 1) is possible but non-trivial. The algorithm could probably be extended to find percentiles. Why do you need this? Is it homework? Commented Nov 4, 2010 at 9:13
  • 1
    Dup: stackoverflow.com/questions/1557678/… ? Commented Nov 4, 2010 at 10:38

3 Answers 3

5

You will first need to find the median, which can be done in O(n) (for example using Hoare's Quickselect algorithm).

Then you will need to implement a sorting algorithm which sorts the elements in the array according to their absolute distance to the median (smallest distances first).

If you were to sort the entire array this way, this would typically take somewhere from O(n * log n) to O(n^2), depending on the algorithm being used. However since you only need the first k values, the complexity can be reduced to O(k * log n) to O(k * n).

Since k is a constant and does not depend on the size of the array, the overall complexity in a worst case scenario will be: O(n) (for finding the median) + O(k * n) (sorting), which is O(n) overall.

Sign up to request clarification or add additional context in comments.

3 Comments

how can you find median in O(n)?
i tried to upvote 4-5 times but SO is preventing me from upvoting .
0

I think you can do this using a variant on quicksort.

You start with a set S of n items and are looking for the "middle" k items. You can think of this as partitioning S into three parts of sizes n - k/2 (the "lower" items), k (the "middle" items), and n - k/2 (the "upper" items).

This gives us a strategy: first remove the lower n - k/2 items from S, leaving S'. Then remove the upper n - k/2 items from S', leaving S'', which is the middle k items of S.

You can easily partition a set this way using "half a quicksort": choose a pivot, partition the set into L and U (lower and upper elements w.r.t. the pivot), then you know the items to discard in the partition must be either all of L and some of U or vice versa: recurse accordingly.

[Thinking further, this may not be exactly what you want if you define "closest to the median" in some other way, but it's a start.]

Comments

0

Assumption: we care about the k values in A that are closest to the median. If we had an A={1,2,2,2,2,2,2,2,2,2,2,2,3}, and k=3, the answer is {2,2,2}. Similarly, if we have A={0,1,2,3,3,4,5,6}, and k=3, answers {2,3,3} and {3,3,4} are equally valid. Furthermore, we are not interested in the indices from which these values came, though I imagine some small tweaks to the algorithm would work.

  1. As Grodrigues states, first find the median in O(n) time. While we're at it, keep track of the largest and smallest number
  2. Next, create an array K, k items long. This array will contain the distance an item is from the median. (note that
  3. Copy the first k items from A into K.
  4. For each item A[i], compare the distance of A[i] from the median to each item of K. If A[i] is closer to the median than the farthest item from the median in K, replace that item. As an optimization, we could also track K's closest and farthest items from the median, so we have a faster comparison to K, or we could keep K sorted, but neither optimization is necessary to operate in O(n) time.

Pseudocode, C++ ish:

/* n = length of array
 * array = A, given in the problem
 * result is a pre-allocated array where the result will be placed 
 * k is the length of result
 *
 * returns
 *   0  for success
 *   -1 for invalid input
 *   1  for other errors
 *
 * Implementation note: optimizations are skipped.
 */
#define SUCCESS        0
#define INVALID_INPUT -1
#define ERROR          1
void find_k_closest(int n, int[] array, int k, int[] result)
{
  // if we're looking for more results than possible,
  // it's impossible to give a valid result.
  if( k > n ) return INVALID_INPUT;


  // populate result with the first k elements of array.
  for( int i=0; i<k; i++ )
  {
    result[i] = array[i];
  }

  // if we're looking for n items of an n length array,
  // we don't need to do any comparisons
  // Up to this point, function is O(k).  Worst case, k==n,
  // and we're O(n)
  if( k==n ) return 0;

  // Assume an O(n) median function
  // Note that we don't bother finding the median if there's an
  // error or if the output is the input.
  int median = median(array);

  // Convert the result array to be distance, not 
  // actual numbers
  for( int i=0; i<k; i++)
  {
    result[i] = result[i]-median;
         // if array[i]=1, median=3, array[i] will be set to 2.
         //             4         3                         -1.
  }

  // Up to this point, function is O(2k+n) = O(n)


  // find the closest items.
  // Outer loop is O(n * order_inner_loop)
  // Inner loop is O(k)
  // Thus outer loop is O(2k*n) = O(n)
  // Note that we start at k, since the first k elements
  // of array are already in result.
  OUTER: for(int i=k; i<n; i++)
  {
    int distance = array[i]-median;
    int abs_distance = abs(distance);

    // find the result farthest from the median
    int idx = 0;
#define FURTHER(a,b) ((abs(a)>abs(b)) ? 1 : 0;
    INNER: for( int i=1; i<k; i++ )
    {
        idx = (FURTHER(result[i],result[i-1])) ? i:i-1;
    }

    // If array[i] is closer to the median than the farthest element of
    // result, replace the farthest element of result with array[i]
    if( abs_distance < result[idx] ){ result[idx] = distance; }
    }
  }
  // Up to this point, function is O(2n)

  // convert result from distance to values
  for( int i=0; i<k; i++)
  {
     result[i] = median - result[i];
         // if array[i]=2 , median=3, array[i] will be set to 1.
         //             -1         3                          4.
  }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.