2

Let's say you have some set of numbers with a known lower bound and unknown upper bound, i.e. 0, 1, 2, 3, ... 78 where 78 is the unknown. Assume for the moment there are no gaps in between numbers. There is a time-expensive function test() that tests if a number is in the set.

What is an efficient way (requiring a low amount of test() calls) to find the highest number in the set?

What if you have the added knowledge that the upper bound is 75 +/- 25?

What if there are random gaps between numbers in the set, i.e. 0, 1, 3, 4, 7, ... 78?

2
  • We'll need more information to answer properly. Your examples all show sorted data. If this is the case, the answer becomes trivial. Also, is Test() the only operation available? A inexpensive GetAtIndex() or GetNext() would also make this much easier. Commented Feb 27, 2009 at 14:46
  • I think the problem people have had is "you have a set" - no you don't have a set - you want to discover the set. You have the lowest member of the set, and a function to test if a integer is in the set. Commented Feb 27, 2009 at 15:08

7 Answers 7

3

For the "no gaps case":

  • I assume that this is a fixed size of number, e.g. a 32 bit int
  • We wish to find x such that test(x) == true, test(x+1) == false, right?

You basically do a binary chop between the lowest known "not in set" (e.g. the biggest 32 bit int) and the highest known "in set" (starting with the known lower bound) by testing the middle value in the range each time and adjusting the boundaries accordingly. This would give an O(log N) solution (in terms of numbers of calls to test()) where X is the size of the potential set, not the actual set. This will be slower than just trying 1, 2, 3... for small sets, but much faster for large ones.

All of this falls down if there can be gaps, at which point I don't think there's any feasible solution beyond "start with the absolute highest possible number and work down until test(x) == true at which point that's the highest number". Any other strategy will fail or be more expensive as far as I can see.

Sign up to request clarification or add additional context in comments.

3 Comments

Jon, take into consideration that you must demand BOTH test(x)==true AND test(y)==false FOR EACH y>x
Sorry, I missed the disclaimer at the beginning :)
@Yuval, JON SKEET IS NEVER WRONG!
1

Your best bet is to simply run through the set with O(n) complexity, which is not bad.

Take into consideration that the set is not sorted (it is a set, after all, and this is the given), each isInSet(n) operation takes O(n) as well, bringing you to O(n^2) for the entire operation, if you choose any algorithm for prodding the set at certain places...

A much better solution, if the set is in your control, would be to simply keep a max value of the set and update it on each insertion to the set. This will be O(1) for all cases.

1 Comment

Then it is not a set anymore... and then you have immediate access to the max value.
1
  1. Set Step to 1
  2. set Upper to Lower + Step
  3. if test(Upper) is true then set Lower to Upper, multiply Step by 2 and go to point 2
  4. at this point you know that Lower is in your set while Upper is not. You can now do a binary search between Lower and Upper to find the limit.

This looks like O(log n * O(test)) complexity.

If you know that Upper is between 50 and 100, Do a binary search between these two values.

If you have random gaps and you know that the upper bound is 100 maximum I suspect you can not do better than starting from there and testing every number one by one until test() finds a value in your set.

If you have random gaps and you do not know an upper limit then you can never be sure you found the upper bound.

3 Comments

Try O(nlogn). Each search takes O(n).
Yuval: O(2 *logN). The two searches kmkaplan describes are each O(log N), and are sequential, not nested.
This is actually the best answer here. You don't need to guess an upper bound.
0

Maybe you should just traverse through it? It would be O(n) complex. I think there is no other way to do this.

Comments

0

Do you know the set size, before hand?

Actually, I guess you probably don't - otherwise the first problem would be trivial.

It would help if you had some idea how big the set was though.

  1. Take a guess at the top value
  2. Test - if in then increment value by some amount
  3. If not in then decrease value by some amount
  4. Once you have upper and lower bounds for largest value, binary search till you find it (to required precision).

For the gaps you've no such ability - you can't even tell when you've found the largest element. (Unless you known the maximum gap size)

2 Comments

How is this any better than a one-shot traverse on the set?
Yuval: because traversing the set isn't offered as an option. A linear search would require traversing the entire (size unknown) domain.
0

If there are no gaps, then you are probably best off with a binary search.

If we use the second assumption, that the top is 75 +/- 25, then are Low end is 50 and high end is 100, and our first test case is 75. If it is present, then the low end is 75 and the high end is 100, and our test case is 87. That should yield results in O( ln N) (where here N would be 50).

If we can't assume a possible upper range, we just have to made educated guess at what it might be. If a value is not found, it becomes the high end. If it is found, it's the low end, and we double it to find the high end.

If there are gaps, the only way I can see of doing it is a linear search -- but even then you'll need a way of knowing when you reached the end, rather that just a big gap.

Comments

0

If your set happens to be the set of prime numbers, let me know when you find the biggest one. I'm sure we can work something out. ;)

But seriously, I'm guessing you know for a fact that the set does indeed have a largest value. Or, you're chopping it to a 32-bit integer.

A couple of suggestions:

1) Think of every case you can that would speed a result of test(x) == false. Then you can go on to the next one. If the time you spend going through all of the ejection cases is far less than going through the full test, then you'll come out ahead. 2) Can you gain any information from each test? For example, does test(x) == false imply that test(x+5679) == false as well?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.