Assumption: we care about the k values in A that are closest to the median. If we had an A={1,2,2,2,2,2,2,2,2,2,2,2,3}, and k=3, the answer is {2,2,2}. Similarly, if we have A={0,1,2,3,3,4,5,6}, and k=3, answers {2,3,3} and {3,3,4} are equally valid. Furthermore, we are not interested in the indices from which these values came, though I imagine some small tweaks to the algorithm would work.
- As Grodrigues states, first find the median in O(n) time. While we're at it, keep track of the largest and smallest number
- Next, create an array K, k items long. This array will contain the distance an item is from the median. (note that
- Copy the first k items from A into K.
- For each item A[i], compare the distance of A[i] from the median to each item of K. If A[i] is closer to the median than the farthest item from the median in K, replace that item. As an optimization, we could also track K's closest and farthest items from the median, so we have a faster comparison to K, or we could keep K sorted, but neither optimization is necessary to operate in O(n) time.
Pseudocode, C++ ish:
/* n = length of array
* array = A, given in the problem
* result is a pre-allocated array where the result will be placed
* k is the length of result
*
* returns
* 0 for success
* -1 for invalid input
* 1 for other errors
*
* Implementation note: optimizations are skipped.
*/
#define SUCCESS 0
#define INVALID_INPUT -1
#define ERROR 1
void find_k_closest(int n, int[] array, int k, int[] result)
{
// if we're looking for more results than possible,
// it's impossible to give a valid result.
if( k > n ) return INVALID_INPUT;
// populate result with the first k elements of array.
for( int i=0; i<k; i++ )
{
result[i] = array[i];
}
// if we're looking for n items of an n length array,
// we don't need to do any comparisons
// Up to this point, function is O(k). Worst case, k==n,
// and we're O(n)
if( k==n ) return 0;
// Assume an O(n) median function
// Note that we don't bother finding the median if there's an
// error or if the output is the input.
int median = median(array);
// Convert the result array to be distance, not
// actual numbers
for( int i=0; i<k; i++)
{
result[i] = result[i]-median;
// if array[i]=1, median=3, array[i] will be set to 2.
// 4 3 -1.
}
// Up to this point, function is O(2k+n) = O(n)
// find the closest items.
// Outer loop is O(n * order_inner_loop)
// Inner loop is O(k)
// Thus outer loop is O(2k*n) = O(n)
// Note that we start at k, since the first k elements
// of array are already in result.
OUTER: for(int i=k; i<n; i++)
{
int distance = array[i]-median;
int abs_distance = abs(distance);
// find the result farthest from the median
int idx = 0;
#define FURTHER(a,b) ((abs(a)>abs(b)) ? 1 : 0;
INNER: for( int i=1; i<k; i++ )
{
idx = (FURTHER(result[i],result[i-1])) ? i:i-1;
}
// If array[i] is closer to the median than the farthest element of
// result, replace the farthest element of result with array[i]
if( abs_distance < result[idx] ){ result[idx] = distance; }
}
}
// Up to this point, function is O(2n)
// convert result from distance to values
for( int i=0; i<k; i++)
{
result[i] = median - result[i];
// if array[i]=2 , median=3, array[i] will be set to 1.
// -1 3 4.
}
}