1

Is it possible to compute the number of different elements in an array in linear time and constant space? Let us say it's an array of long integers, and you can not allocate an array of length sizeof(long).

P.S. Not homework, just curious. I've got a book that sort of implies that it is possible.

4
  • Can you in-place modify the arrays? Commented Mar 23, 2010 at 17:18
  • In-place modification of the array won't change the linear time, because you can copy or duplicate an array in linear time. Commented Mar 23, 2010 at 17:20
  • @Kevin: But there's a "constant space" requirement. Commented Mar 23, 2010 at 17:45
  • @Kenny: Oops. Sorry about that, I missed that bit. You are correct. Commented Mar 23, 2010 at 18:13

7 Answers 7

3

This is the Element uniqueness problem, for which the lower bound is Ω( n log n ), for comparison-based models. The obvious hashing or bucket sorting solution all requires linear space too, so I'm not sure this is possible.

Sign up to request clarification or add additional context in comments.

5 Comments

I think the bucket solutions require constant space. Actually they require that the number of possible different values is constant. So a bucket (with a boolean) for each value is still constant space.
That's not a bucket, that's a hash. Yes, it is constant for some definition of constant, for example, you can allocate an array the size of the known universe every time. It does not further the conversation though. Of course, you can say to yourself, wait, there are only N possible unique values! So let's just use N buckets instead! Oh wait..
That's how it's done. I'm not saying this is a practical solution. It's a theoretical answer to a theoretical question.
I have a better solution for O(N) solution anyhow, "constant" space. Just precalculate the answers for all possible array size, elements. All you have to do now is look it up in the table!
Actually, that's a valid answer in the theory of computational complexity, given you're dealing with a finite number of different inputs. Btw, hashmaps are not constant in lookup time but linear or log, depending on the implementation.
1

You can't use constant space. You can use O(number of different elements) space; that's what a HashSet does.

4 Comments

The asker probably wants O(log # of distinct elements) space.
@algorithmist: But that takes O(n log m) time, where m is the number of distinct elements. That's not linear.
I didn't even specify an algorithm. How do you know how fast it runs? Usually these kinds of questions are best posed in the unit-cost model, which counts basic operations on (log n)-bit words.
I can make a good guess based upon how many bits of information about the answer you can possibly accumulate per step. n space allows for log n accumulation of information per step, but you don't allow that much, so you're limited to the n log n family of solutions.
1

You can use any sorting algorithm and count the number of different adjacent elements in the array.

Comments

0

I do not think this can be done in linear time. One algorithm to solve in O(n log n) requires first sorting the array (then the comparisons become trivial).

Comments

0

If you are guaranteed that the numbers in the array are bounded above and below, by say a and b, then you could allocate an array of size b - a, and use it to keep track of which numbers have been seen.

i.e., you would move through your input array take each number, and mark a true in your target array at that spot. You would increment a counter of distinct numbers only when you encounter a number whose position in your storage array is false.

Comments

0

Assuming we can partially destroy the input, here's an algorithm for n words of O(log n) bits.

Find the element of order sqrt(n) via linear-time selection. Partition the array using this element as a pivot (O(n)). Using brute force, count the number of different elements in the partition of length sqrt(n). (This is O(sqrt(n)^2) = O(n).) Now use an in-place radix sort on the rest, where each "digit" is log(sqrt(n)) = log(n)/2 bits and we use the first partition to store the digit counts.


If you consider streaming algorithms only ( http://en.wikipedia.org/wiki/Streaming_algorithm ), then it's impossible to get an exact answer with o(n) bits of storage via a communication complexity lower bound ( http://en.wikipedia.org/wiki/Communication_complexity ), but possible to approximate the answer using randomness and little space (Alon, Matias, and Szegedy).

Comments

-1

This can be done with a bucket approach when assuming that there are only a constant number of different values. Make a flag for each value (still constant space). Traverse the list and flag the occured values. If you happen to flag an already flagged value, you've found a duplicate. You have to traverse the buckets for each element in the list. But that's still linear time.

4 Comments

Why would you make the assumption that there are only a constant number of different values?
Assuming we are talking about arrays that contain longs (or bytes or shorts to be more realistic), you have a constant space complexity. There are only 256 different byte values, 65336 values for short, etc...
So, what's your array size for an array with elements of type "string", length K each? What about arrays of arbitrary objects? How about providing a feasible answer?
I was talking about objects with a fixed number of possible values. Strings are not covered by my answer. I return the question: How about an array of objects of infinite size?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.