3

I have a need for a data structure that will be able to give preceding and following neighbors for a given int that is part of the structure.

Some criteria I've set for myself:

  • write once, read many times
  • contain 100 to 1000 int
  • be efficient: order of magnitude O(1)
  • be memory efficient (size of the ints + some housekeeping bits ideally)
  • implemented in pure Java (no libraries for this, as I want to learn)
  • items are unique
  • no concurrency requirements
  • ints are ordered externally, that order will most likely not be a natural ordering, and that order must be preserved (ie. there is no contract whatsoever regarding the difference in value between two neighboring ints - any int may be greater or smaller than the int it preceeds in the order).

This is in Java, and is mostly theoretical, as I've started using the solution described below.

Things I've considered:

  • LinkedHashSet: very quick to find an item, order of O(1), and very quick to retrieve next neighbor. No apparent way to get previous neighbor without reverse sorting the set. Boxed Integer objects only.
  • int[]: very easy on memory because no boxing required, very quick to get previous and next neighbor, retrieval of an item is O(n) though because index is not known and array traversal is required, and that is not acceptable.

What I'm using now is a combination of int[] and HashMap:

  • HashMap for retrieving index of a specific int in the int[]
  • int[] for retrieving the neighbors of that int

What I like:

  • neighbor lookup is ideally O(2)
  • int[] does not do boxing
  • performance is theoretically very good

What I dislike:

  • HashMap does boxing twice (key and value)
  • the ints are stored twice (in both the map and the array)
  • theoretical memory use could be improved quite a bit

I'd be curious to hear of better solutions.

12
  • What I'm using now is a combination of int[] and HashMap: can you show what have you tried alread? Commented May 20, 2015 at 12:29
  • 2
    How is retrieval of an int[] item O(n)? Searching for an item is, but retrieving is a constant-time operation. Commented May 20, 2015 at 12:29
  • 2
    Retrieval is O(n) for int[]? Don't think so Commented May 20, 2015 at 12:30
  • Is there a range for the int values which you need to store? Can they by anything between MIN_INT and MAX_INT or less? Commented May 20, 2015 at 12:33
  • Edited for clarification in question: when I talk about int[] and O(n), I do mean search: I have the item only, and not its index in the array, so I need a search first, which requires at most O(n) time. I believe this was clear from the rest of my question though. Commented May 20, 2015 at 12:37

2 Answers 2

1

One solution is to sort the array when you add elements. That way, the previous element is always i-1 and to locate a value, you can use a binary search which is O(log(N)).

The next obvious candidate is a balanced binary tree. For this structure, insert is somewhat expensive but lookup is again O(log(N)).

If the values aren't 32bit, then you can make the lookup faster by having a second array where each value is the index in the first and the index is the value you're looking for.

More options: You could look at bit sets but that again depends on the range which the values can have.

Commons Lang has a hash map which uses primitive int as keys: http://grepcode.com/file/repo1.maven.org/maven2/commons-lang/commons-lang/2.6/org/apache/commons/lang/IntHashMap.java but the type is internal, so you'd have to copy the code to use it.

That means you don't need to autobox anything (unboxing is cheap).

Related:

Sign up to request clarification or add additional context in comments.

Comments

0

ints are ordered externally, that order will most likely not be a natural ordering, and that order must be preserved (ie. there is no contract whatsoever regarding the difference in value between two neighboring ints).

This says "Tree" to me. Like Aaron said, expensive insert but efficient lookup, which is what you want if you have write once, read many.

EDIT: Thinking about this a bit more, if a value can only ever have one child and one parent, and given all your other requirements, I think ArrayList will work just fine. It's simple and very fast, even though it's O(n). But if the data set grows, you'll probably be better off using a Map-List combo.

Keep in mind when working with these structures that the theoretical performance in terms of O() doesn't always correspond to real-word performance. You need to take into account your dataset size and overall environment. One example: ArrayList and HashMap. In theory, List is O(n) for unsorted lookup, while Map is O(1). However, there's a lot of overhead in creating and managing entries for a map, which actually gives worse performance on smaller sets than a List.

Since you say you don't have to worry about memory, I'd stay away from array. The complexity of managing the size isn't worth it on your specified data set size.

3 Comments

"array. The complexity of managing the size isn't worth it on your specified data set size." Okay, but as this is write-once, I would think managing size is not actually an issue (at least not a defining one). Also, perhaps I'm mistaken (again), but doesn't a tree suggest natural ordering?
It can, but it's not required.
@Zaan When you say "write once" I'm see that as writing the value once. If there are 1000 values, that's 1000 writes. Without knowing all of the specifics, it's hard to say for sure what you need. But using raw arrays is generally not worth the trouble.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.