1

You are given a string. Develop a function to remove duplicate characters from that string. String could be of any length. Your algorithm must be in space. If you wish you can use constant size extra space which is not dependent any how on string size. Your algorithm must be of complexity of O(n).

My idea was to define an integer array of size of 26 where 0th index would correspond to the letter a and the 25th index for the letter z and initialize all the elements to 0. Thus we will travel the entire string once and and would increment the value at the desired index as and when we encounter a letter.

and then we will travel the string once again and if the value at the desired index is 1 we print out the letter otherwise we do not.

In this way the time complexity is O(n) and the space used is constant irrespective of the length of the string!!

if anyone can come up with ideas of better efficiency,it will be very helpful!!

9
  • guess this is a homework :P any programming language constraint? or pseudo-code is fine? For example if it were PHP you could append to an array and the size would be variable, only using up to as many characters as you have traveled and checking if a char has been checked is a matter of an in_array (for example) Commented Aug 5, 2011 at 6:42
  • 1
    I don't think assuming the string contains only a to z is fair. There are over one million codepoints in Unicode. Commented Aug 5, 2011 at 6:43
  • 1
    @purefan this is not a homework question as you can already see i hav tagged it as an interview qs.. Commented Aug 5, 2011 at 6:46
  • @Ray Toal Ya i will be grateful if you can come up with a better soln whc will include all the codepoints Commented Aug 5, 2011 at 6:47
  • 1
    Okay I will give it a shot but first, what is meant by repeated? Do you mean any character that appears more than once anywhere in the string, or any contiguous runs of two or more occurrences of the same character. The latter is what is removed in the Unix uniq program. Commented Aug 5, 2011 at 6:52

2 Answers 2

1

Your solution definitely fits the criteria of O(n) time. Instead of an array, which would be very, very large if the allowed alphabet is large (Unicode has over a million characters), you could use a plain hash. Here is your algorithm in (unoptimized!) Ruby:

def undup(s) 
  seen = Hash.new(0)
  s.each_char {|c| seen[c] += 1}
  result = ""
  s.each_char {|c| result << c if seen[c] == 1}
  result
end

puts(undup "")
puts(undup "abc")
puts(undup "Olé")
puts(undup "asdasjhdfasjhdfasbfdasdfaghsfdahgsdfahgsdfhgt")

It makes two passes through the string, and since hash lookup is less than linear, you're good.

You can say the Hashtable (like your array) uses constant space, albeit large, because it is bounded above by the size of the alphabet. Even if the size of the alphabet is larger than that of the string, it still counts as constant space.

There are many variations to this problem, many of which are fun. To do it truly in place, you can sort first; this gives O(n log n). There are variations on merge sort where you ignore dups during the merge. In fact, this "no external hashtable" restriction appears in Algorithm: efficient way to remove duplicate integers from an array (also tagged interview question).

Another common interview question starts with a simple string, then they say, okay now a million character string, okay now a string with 100 billion characters, and so on. Things get very interesting when you start considering Big Data.

Anyway, your idea is pretty good. It can generally be tweaked as follows: Use a set, not a dictionary. Go trough the string. For each character, if it is not in the set, add it. If it is, delete it. Sets take up less space, don't need counters, and can be implemented as bitsets if the alphabet is small, and this algorithm does not need two passes.

Python implementation: http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/

Sign up to request clarification or add additional context in comments.

Comments

0

You can also use a bitset instead of the additional array to keep track of found chars. Depending on which characters (a-z or more) are allowed you size the bitset accordingly. This requires less space than an integer array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.