2

Lets say we have to strings, A and B. The task is to insert any needed letters in the string B in order to end up with the string A.

For example:

A - This is just a simple test 
B - is t a sim te

So if we look at the string A like this:

--is -- ---t a sim--- te--

or:

---- is ---t a sim--- te--

it is clear that we can build string A from the string B, and the output should be in the above written format (both answers are correct).

Can you think of an algorithm that will solve this in the reasonable time? It is quite easy to come up with brute force solution, but I need something a bit more sophisticated than that.

2
  • Can you please provide some context for the problems - i.e. why do you need it? Where did you come across it? Also, what have you tried so far? Commented Apr 3, 2011 at 14:47
  • Is --i- -s ---t a sim--- te-- also a valid solution for your example? Commented Apr 7, 2011 at 22:42

5 Answers 5

3

You could take Levenshtein distance algorithm as a base and extend it to also remember the characters that were added/deleted/substituted. That would run in linear time.

Sign up to request clarification or add additional context in comments.

1 Comment

Levenshtein distance runs in O(n*m) if I am not mistaken, where n and m are the lengths of A and B.
2

You can just find first occurrence of characters of B in A, just start finding occurrence after last index found in A, for example in your case:

A - This is just a simple test 
B - is t a sim te

i: 3rd place in A,
s: 4th place in A,
' ': 5th place in A,
t: 12th place in A, (because last index was 4)
' ': ...
a: ....

That's O(|A|+|B|) and because |A| > |B| it's O(|A|).

After finding indices it's easy to convert B to A, just by adding characters of A between 2 indices to B.

Edit: Also if there is no match this algorithm, works fine (there will be some characters in B which are not in A, from last index).

1 Comment

Why would this greedy solution work optimally? The first occurence might not be the 'correct' one.
0

I would think you could determine if its possible with a nice reg expression. as there may or may not be 1 or more characters between any of the ones suggested in B.

7 Comments

I am not sure this can be easily done with regexs, and I am sure that it would be very slow solution. Thanks for the answer :).
@Klark: actually I think a regexp is pretty good to just check if it is possible to get string A from B. e.g. if /.*is.*t.*a.*sim.*te.*/ has a match, it's possible... Then, yeah to find what chars you need it's another story
Sure a regex would work, for example (.*)? between each character will then give you a search, for 0 or more characters between each character. So it would find that sme would match some and someone
@BugFinder, thats brute force.
No, in most peoples eyes brute force would be to iterate through the sentence and step by step check if matches and what characters would be needed to go in. This is pattern matching, and is a quick "does it or doest it or not fit the pattern".. as digEmAll says though, finding the missing characters would be harder, although of course the reg expression would give you the missing peices.
|
0

I actually think brute force will be the quickest but you have to mutate string A to do it fastests in one pass... just iterate over A change each letter to "-" that doesn't match the letter you are looking for unless it is a space type of deal... that will be constant time and require no extra storage... basically your time will be N. Of course you have to deal with the tokens from "B" so once you find "i" you need to make sure the next letter is "s" so that you find "is" but is still should be quick just don't go back to the I position if the next letter isn't s... that should point you in the correct direction without giving you the solution to your homework...

Comments

0

I think you could do it in linear time and space, like this (pseudocode, completely untested):

String f( String a, String b ) {
  String result;

  // Iterate over 'a' and 'b' in parallel. If both have the same
  // character, add it to the result. Otherwise, if 'a' has a space add a space to the result, otherwise add a dash.
  int idxB = 0;
  for (int idxA = 0; idxA < a.length(); ++idxA ) {
    if (a[idxA] == b[idxB]) {
      result.append(a[idxA]);
      ++idxB;
    } else if (a[idxA] == ' ') {
      result.append(' ');
    } else {
      result.append('-');
    }
  }

  // Tack on dashes to the result until it's as long as 'a'.
  while (result.length() < a.length()) {
    result.append('-');
  }

  return result;
}

5 Comments

this example doesn't quite work as if the first string was "Think this is..." would give improper results... also there isn't confirmation this isn't homework so maybe giving a solution in code was a bad idea? Also would replace the spaces with hyphens so that would be bad too...
@jsobo: I just tested a C++ version of this algorithm, and it worked fine (except that spaces became hyphens, but well - I wrote the algorithm from the top of my head and it was untested :-).
Try string A = "think I This is just a simple test" with the string B up there... you will get "--i-- - ---s... " which isn't correct
@jsobo: If that is not correct, then you seem to have additional requirements that are not mentioned in the question.
@Svante B having "IS" implies that the I and the S are next to each other. Your algorithm doesn't not ensure that will happen. Maybe the question doesn't explicitly state that requirement but it is so strongly implied I think we can safely infer it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.