2

I am programming c on linux and I have a big integer array, how to filter it, say, find values that fit some condition, e.g. value > 1789 && value < 2031. what's the efficient way to do this, do I need to sort this array first?

I've read the answers and thank you all, but I need to do such filtering operation many times on this big array, not only for once. so is iterating it one by one every time the best way?

1
  • 1
    Are the conditions going to be always as simple as this? Commented Mar 7, 2012 at 9:49

6 Answers 6

2

If the only thing you want to do with the array is to get the values that match this criteria, it would be faster just to iterate over the array and check each value for the condition (O(n) vs. O(nlogn)). If however, you are going to perform multiple operations on this array, than it's better to sort it.

Sign up to request clarification or add additional context in comments.

Comments

1

Sort the array first. Then on each query do 2 binary searches. I'm assuming queries will be like -

Find integers x such that a < x < b

First binary search would find the index i of the element such that Array[i-1] <= a < Array[i] and second binary search would find the index j such that Array[j] < b <= Array[j+1]. Then your desired range would be [i, j].

This algorithm's complexity is O(NlogN) in preprocessing and O(N) per query if you want to iterate over all the elements and O(logN) per query if you just want to count the number of filtered element.

Let me know if you need help implementing binary search in C. There is library function named binary_search() in C and lower_bound() and upper_bound() in C++ STL.

1 Comment

You should note that it's O(N) per query if you want to iterate in the worst case. If you're always selecting only intervals with small amount of items, it could be O(log N).
1

You could use a max heap implemented as an array of the same size as the source array. Initialize it with min-1 value and insert values into the max-heap as the numbers come in. The first check would be to see if the number to be inserted is greater than the first element, if it's not, discard it, if it is larger then insert it into the array. To get the list of numbers back, read all numbers in the new array till min-1.

3 Comments

Why would you use O(n log n) algorithm when you can use much simpler one that runs in O(n)?
It takes O(log n), not O(n log n)!!
Inserting n items into any structure can't be ever faster than O(n). And in the case of inserting them into a heap one by one, it is really going to be O(n log n).
0

To filter the array, you'll have to look at each element once. There's no need to look at any element more than once, so a simple linear search of the array for items matching your criteria is going to be as efficient as you can get.

Sorting the array would end up looking at some elements more than once, which is not necessary for your purpose.

2 Comments

what if I need to look up this big array many times with different conditions, is it better to sort it first and then use some lookup algorithms?
Yes, if you need to do a lookup of this kind more than once or twice, sorting it would be a good idea.
0

If you can spare some more memory, then you can scan your array once, get the indices of matching values and store it in another array. This new array will be significantly shorter since it has only indices of values which match a specific pattern! Something like this

int original_array[SOME_SIZE];
int new_array[LESS_THAN_SOME__SIZE];

for ( int i=0,j=0; i<SOME_SIZE; i++)
{
    if ( original_array[i]> LOWER_LIMIT && original_array[i]< HIGHER_LIMIT )
    {
        new_array[j++] = i;
    }
}

You need to do the above once and form now on,

for ( int i=0; i< LESS_THAN_SOME_SIZE; i++ )
{
    if ( original_array[new_array[i]]> LOWER_LIMIT && original_array[new_array[i]]< HIGHER_LIMIT )
    {
        printf("Success! Found Value %d\n", original_array[new_array[i]] )
    }
}

So at the cost of some memory, you can save considerable amount of time. Even if you invest some time in sorting, you have to parse the sorted array every time. This method minimizes the array length as well as the sorting time ( at the cost of extra memory, of course :) )

Comments

-1

Try this library: http://code.google.com/p/boolinq/

It is iterator-based and as fast as can be, there are no any overhead. But it needs C++11 standard. Yor code will be written in declarative-way:

int arr[] = {1,2,3,4,5,6,7,8,9};

auto items = boolinq::from(arr).where([](int a){return a>3 && a<6;});
while (!items.empty())
{
    int item = items.front();
    ...
}

Faster than iterator-based scan can be only multithreaded scan...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.