Does the following algorithm for finding distinct strings work?

Question

I recently found a question that went something like:

"Given an array of strings, return the number of distinct strings in that array."

I came up with this solution:

1. Get number_of_strings, which equals the number of strings in the input array
2. Get number_of_non_redundant, which equals the length of the input array cast as a set
3. Return 2 times number_of_non_redundant - number_of_strings

So, my question is, does this algorithm work for all data sets?

where did the 2 times non_redundant - num_strings come from? Wouldn't just the length of the set work? — Phillip Schmidt
– Phillip Schmidt, Commented Aug 15, 2012 at 17:40

Danica · Accepted Answer · 2012-08-15 17:40:35Z

4

Consider the array of strings ["a", "a", "a", "d", "d", "d"].

number_of_strings is 6; number_of_non_redundant is 2. You propose to return 2 * 2 - 6 = -2. So...no, your algorithm doesn't work for all datasets.

Unless I'm greatly misunderstanding the problem, though, just returning number_of_non_redundant will always work, since it's the definition of what you want to return. :)

answered Aug 15, 2012 at 17:40

Danica

29k6 gold badges94 silver badges128 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mjgpy3 Over a year ago

Thank you, this was a very solid answer.

Chris Dargis · Accepted Answer · 2012-08-15 18:14:52Z

As others have pointed out, simply returning number_of_non_redundant seems like the answer to this problem.

Here is a possible solution for determining number_of_non_redundant:

1) Create a Hash Set (Language specific)

2) Iterate through the entire array, on each element of the array check to see if the element exists in the Hash Set, if it doesn't, add it.

3) Return the size of the Hash Set.

Using a Hash Set here offers constant time operations(add, contains).

Additionally I wanted to point out that you can't (at least I am not aware of this in a language) simply cast an array to a set. Casting is a constant time operation. These are two different data structures and in order to take elements from an array and place them in a set, it would require iterating through the array and entering the elements into a set.

linello · Accepted Answer · 2012-08-15 17:57:24Z

0

What about first sorting the array lexicographically and then loop over it with a flag variable keeping track of changes between element i-th and (i-1)-th?

answered Aug 15, 2012 at 17:57

linello

8,72119 gold badges70 silver badges113 bronze badges

Comments

Jeff · Accepted Answer · 2012-08-15 18:00:33Z

0

This algorithm DOES NOT work for all data sets. It might work for specific examples though.

say n = number of non redundant strings 
p = number of strings in original array

According to you 2n-p = n => n= p

Your algorithm works only when the (number of non redundant strings = length of original array) , which means only when the original array is a set.

Just to give a hint, the ideal way to solve this is hashing if you have enough memory available or you can do it in place using Sorting but would take longer compared to hashing

answered Aug 15, 2012 at 18:00

Jeff

1931 gold badge1 silver badge8 bronze badges

Collectives™ on Stack Overflow

Does the following algorithm for finding distinct strings work?

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related