How to design an efficient algorithm for least upper bound search

Question

Let's say you have some set of numbers with a known lower bound and unknown upper bound, i.e. 0, 1, 2, 3, ... 78 where 78 is the unknown. Assume for the moment there are no gaps in between numbers. There is a time-expensive function test() that tests if a number is in the set.

What is an efficient way (requiring a low amount of test() calls) to find the highest number in the set?

What if you have the added knowledge that the upper bound is 75 +/- 25?

What if there are random gaps between numbers in the set, i.e. 0, 1, 3, 4, 7, ... 78?

We'll need more information to answer properly. Your examples all show sorted data. If this is the case, the answer becomes trivial. Also, is Test() the only operation available? A inexpensive GetAtIndex() or GetNext() would also make this much easier. — James Curran
– James Curran, Commented Feb 27, 2009 at 14:46
I think the problem people have had is "you have a set" - no you don't have a set - you want to discover the set. You have the lowest member of the set, and a function to test if a integer is in the set. — Douglas Leeder
– Douglas Leeder, Commented Feb 27, 2009 at 15:08

Jon Skeet · Accepted Answer · 2009-02-27 14:49:57Z

3

For the "no gaps case":

I assume that this is a fixed size of number, e.g. a 32 bit int
We wish to find x such that test(x) == true, test(x+1) == false, right?

You basically do a binary chop between the lowest known "not in set" (e.g. the biggest 32 bit int) and the highest known "in set" (starting with the known lower bound) by testing the middle value in the range each time and adjusting the boundaries accordingly. This would give an O(log N) solution (in terms of numbers of calls to test()) where X is the size of the potential set, not the actual set. This will be slower than just trying 1, 2, 3... for small sets, but much faster for large ones.

All of this falls down if there can be gaps, at which point I don't think there's any feasible solution beyond "start with the absolute highest possible number and work down until test(x) == true at which point that's the highest number". Any other strategy will fail or be more expensive as far as I can see.

edited Feb 27, 2009 at 14:49

answered Feb 27, 2009 at 14:27

Jon Skeet

1.5m893 gold badges9.3k silver badges9.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Yuval Adam Over a year ago

Jon, take into consideration that you must demand BOTH test(x)==true AND test(y)==false FOR EACH y>x

Yuval Adam Over a year ago

Sorry, I missed the disclaimer at the beginning :)

Kenan Banks Over a year ago

@Yuval, JON SKEET IS NEVER WRONG!

Yuval Adam · Accepted Answer · 2009-02-27 14:24:05Z

1

Your best bet is to simply run through the set with O(n) complexity, which is not bad.

Take into consideration that the set is not sorted (it is a set, after all, and this is the given), each isInSet(n) operation takes O(n) as well, bringing you to O(n^2) for the entire operation, if you choose any algorithm for prodding the set at certain places...

A much better solution, if the set is in your control, would be to simply keep a max value of the set and update it on each insertion to the set. This will be O(1) for all cases.

answered Feb 27, 2009 at 14:24

Yuval Adam

166k95 gold badges318 silver badges406 bronze badges

1 Comment

Yuval Adam Over a year ago

Then it is not a set anymore... and then you have immediate access to the max value.

kmkaplan · Accepted Answer · 2009-02-27 14:49:11Z

1

Set Step to 1
set Upper to Lower + Step
if test(Upper) is true then set Lower to Upper, multiply Step by 2 and go to point 2
at this point you know that Lower is in your set while Upper is not. You can now do a binary search between Lower and Upper to find the limit.

This looks like O(log n * O(test)) complexity.

If you know that Upper is between 50 and 100, Do a binary search between these two values.

If you have random gaps and you know that the upper bound is 100 maximum I suspect you can not do better than starting from there and testing every number one by one until test() finds a value in your set.

If you have random gaps and you do not know an upper limit then you can never be sure you found the upper bound.

edited Feb 27, 2009 at 14:49

answered Feb 27, 2009 at 14:33

kmkaplan

19k4 gold badges55 silver badges65 bronze badges

3 Comments

Yuval Adam Over a year ago

Try O(nlogn). Each search takes O(n).

James Curran Over a year ago

Yuval: O(2 *logN). The two searches kmkaplan describes are each O(log N), and are sequential, not nested.

Svante Over a year ago

This is actually the best answer here. You don't need to guess an upper bound.

Migol · Accepted Answer · 2009-02-27 14:20:32Z

0

Maybe you should just traverse through it? It would be O(n) complex. I think there is no other way to do this.

answered Feb 27, 2009 at 14:20

Migol

8,4879 gold badges49 silver badges69 bronze badges

Comments

Douglas Leeder · Accepted Answer · 2009-02-27 14:23:51Z

0

Do you know the set size, before hand?

Actually, I guess you probably don't - otherwise the first problem would be trivial.

It would help if you had some idea how big the set was though.

Take a guess at the top value
Test - if in then increment value by some amount
If not in then decrease value by some amount
Once you have upper and lower bounds for largest value, binary search till you find it (to required precision).

For the gaps you've no such ability - you can't even tell when you've found the largest element. (Unless you known the maximum gap size)

answered Feb 27, 2009 at 14:23

Douglas Leeder

53.5k9 gold badges100 silver badges138 bronze badges

2 Comments

Yuval Adam Over a year ago

How is this any better than a one-shot traverse on the set?

James Curran Over a year ago

Yuval: because traversing the set isn't offered as an option. A linear search would require traversing the entire (size unknown) domain.

James Curran · Accepted Answer · 2009-02-27 14:33:33Z

If there are no gaps, then you are probably best off with a binary search.

If we use the second assumption, that the top is 75 +/- 25, then are Low end is 50 and high end is 100, and our first test case is 75. If it is present, then the low end is 75 and the high end is 100, and our test case is 87. That should yield results in O( ln N) (where here N would be 50).

If we can't assume a possible upper range, we just have to made educated guess at what it might be. If a value is not found, it becomes the high end. If it is found, it's the low end, and we double it to find the high end.

If there are gaps, the only way I can see of doing it is a linear search -- but even then you'll need a way of knowing when you reached the end, rather that just a big gap.

John · Accepted Answer · 2009-02-27 14:41:05Z

0

If your set happens to be the set of prime numbers, let me know when you find the biggest one. I'm sure we can work something out. ;)

But seriously, I'm guessing you know for a fact that the set does indeed have a largest value. Or, you're chopping it to a 32-bit integer.

A couple of suggestions:

1) Think of every case you can that would speed a result of test(x) == false. Then you can go on to the next one. If the time you spend going through all of the ejection cases is far less than going through the full test, then you'll come out ahead. 2) Can you gain any information from each test? For example, does test(x) == false imply that test(x+5679) == false as well?

answered Feb 27, 2009 at 14:41

John

16k10 gold badges76 silver badges114 bronze badges

Collectives™ on Stack Overflow

How to design an efficient algorithm for least upper bound search

7 Answers 7

3 Comments

1 Comment

3 Comments

Comments

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

3 Comments

1 Comment

3 Comments

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related