2

I have an array of integers storing some userIDs. I basically want to prevent a user from performing an action twice, so the moment he has done it his userID enters this array.

I wonder whether it is a good idea to sort or not this array. If it is sorted, then you have A={min, ..., max}. Then, if I'm not wrong, checking if an ID is in the array will take log2(|A|) 'steps'. On the other hand, if the array was not sorted then you will need |A|/2 (in average) steps.

So sorting seems better to check if an element exists in the array (log(|A|) vs |A|), but what about 'adding' a new value? Calculating the position of where the new userID should be can be done at the same time you're checking, but then you will have to displace all the elements from that position by 1... or at least that's how I'd do it on C, truth is this is going to be an array in a MongoDB document, so perhaps this is handled in some other most-effective way.

Of course if the array is unsorted then adding a new value will just take one step ("pushing" it to the end).

To me, an adding operation (with previous checking) will take:

  • If sorted: log2(|A|) + |A|/2. The log2 part to check and find the place and the |A|/2 as an average of the displacements needed.
  • If not sorted: |A|/2 + 1. The |A|/2 to check and the +1 to push the new element.

Given that for adding you'll always first check, then the not sorted version appears to have less steps, but truth is I'm not very confident on the +|A|/2 of the sorted version. That's how I would do it in C, but maybe it can work another way...

3
  • Why not use a hash table for O(1) lookup? Commented Nov 14, 2015 at 2:12
  • @JohnColeman I don't know much about hash tables, but as far as I've read the idea is to find a function that maps your data to a position in an array... But then you need to have bounds because you'd need to allocate memory for that array, right? I mean, in this case the function could be just the identity f(x)=x since we're dealing with userIDs which are integers, but if x is unbounded then I just can't create such array... Commented Nov 19, 2015 at 7:48
  • Why reinvent the wheel? Hash tables (often called dictionaries) have become the go-to data structure when you want O(1) lookup and are built-in to most modern languages, with mature open-source implementations for languages such as C which lack native support. Furthermore, you seem to be talking about something involving a database -- isn't the point of a data base to give you O(1) access to large amounts of data? Amazon doesn't maintain a sorted list of titles but can instantaneously locate any title. Why not use the database to keep track of what you want to keep track of? Commented Nov 19, 2015 at 11:50

1 Answer 1

1

O(Log(A)) is definitely better than O(A), but this can be done in O(1). The data structure you are looking for is HashMap, if you are going to do this in C. I haven't worked in C in a very long time so I don't know if it is natively available now. It surely is available in C++. Also there are some libraries which you can use in the worst case.

For MongoDB, my solution may not be the best, but I think that you can create another collection of just the userIDs and index the collection keyed on userIDs. This way when someone tries to do that action, you can query the user status quickest.

Also in MongoDB you can try adding another key called UserDidTheAction to your User's collection. This key's value may be true or false. Index the collection based on userID and probably you will have similar performance as the other solution, but at the cost of modifying your original collection's design (though it's not required to be fixed in MongoDB).

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer! The idea it that there are some users that can vote (that's the action) some polls. So, in mongo, I was going to create an array in the poll document containing the users that have voted that poll. Maybe it makes more sense to store in the user document the pollIDs he's voted... but anyways, I'd have to end up checking/adding, so what would be the best option for optimality?
@CarlosNavarro: Why not add the {PollID: true/false} to documents in users collection? This way if that pollId is in user's doc, we know that this particular user can vote. And if it is true, we know that the user has voted. Further, add index on PollID. There will be slight but necessary overhead with extra indices.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.