0

I recently found a question that went something like:

"Given an array of strings, return the number of distinct strings in that array."

I came up with this solution:

1. Get number_of_strings, which equals the number of strings in the input array
2. Get number_of_non_redundant, which equals the length of the input array cast as a set
3. Return 2 times number_of_non_redundant - number_of_strings

So, my question is, does this algorithm work for all data sets?

2
  • where did the 2 times non_redundant - num_strings come from? Wouldn't just the length of the set work? Commented Aug 15, 2012 at 17:40
  • 2
    isn't that number_of_non_redundant is already the answer? Commented Aug 15, 2012 at 17:41

4 Answers 4

4

Consider the array of strings ["a", "a", "a", "d", "d", "d"].

number_of_strings is 6; number_of_non_redundant is 2. You propose to return 2 * 2 - 6 = -2. So...no, your algorithm doesn't work for all datasets.

Unless I'm greatly misunderstanding the problem, though, just returning number_of_non_redundant will always work, since it's the definition of what you want to return. :)

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, this was a very solid answer.
2

As others have pointed out, simply returning number_of_non_redundant seems like the answer to this problem.

Here is a possible solution for determining number_of_non_redundant:

1) Create a Hash Set (Language specific)

2) Iterate through the entire array, on each element of the array check to see if the element exists in the Hash Set, if it doesn't, add it.

3) Return the size of the Hash Set.

Using a Hash Set here offers constant time operations(add, contains).

Additionally I wanted to point out that you can't (at least I am not aware of this in a language) simply cast an array to a set. Casting is a constant time operation. These are two different data structures and in order to take elements from an array and place them in a set, it would require iterating through the array and entering the elements into a set.

Comments

0

What about first sorting the array lexicographically and then loop over it with a flag variable keeping track of changes between element i-th and (i-1)-th?

Comments

0

This algorithm DOES NOT work for all data sets. It might work for specific examples though.

say n = number of non redundant strings 
p = number of strings in original array 

According to you 2n-p = n => n= p

Your algorithm works only when the (number of non redundant strings = length of original array) , which means only when the original array is a set.

Just to give a hint, the ideal way to solve this is hashing if you have enough memory available or you can do it in place using Sorting but would take longer compared to hashing

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.