string algorithm homework/interview like question

Question

Lets say we have to strings, A and B. The task is to insert any needed letters in the string B in order to end up with the string A.

For example:

A - This is just a simple test 
B - is t a sim te

So if we look at the string A like this:

--is -- ---t a sim--- te--

or:

---- is ---t a sim--- te--

it is clear that we can build string A from the string B, and the output should be in the above written format (both answers are correct).

Can you think of an algorithm that will solve this in the reasonable time? It is quite easy to come up with brute force solution, but I need something a bit more sophisticated than that.

Can you please provide some context for the problems - i.e. why do you need it? Where did you come across it? Also, what have you tried so far? — MAK
– MAK, Commented Apr 3, 2011 at 14:47
Is --i- -s ---t a sim--- te-- also a valid solution for your example? — Svante
– Svante, Commented Apr 7, 2011 at 22:42

Gumbo · Accepted Answer · 2011-04-03 13:04:44Z

3

You could take Levenshtein distance algorithm as a base and extend it to also remember the characters that were added/deleted/substituted. That would run in linear time.

answered Apr 3, 2011 at 13:04

Gumbo

657k112 gold badges792 silver badges852 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

dcn Over a year ago

Levenshtein distance runs in O(n*m) if I am not mistaken, where n and m are the lengths of A and B.

Saeed Amiri · Accepted Answer · 2011-04-03 13:29:06Z

2

You can just find first occurrence of characters of B in A, just start finding occurrence after last index found in A, for example in your case:

A - This is just a simple test 
B - is t a sim te

i: 3rd place in A,
s: 4th place in A,
' ': 5th place in A,
t: 12th place in A, (because last index was 4)
' ': ...
a: ....

That's O(|A|+|B|) and because |A| > |B| it's O(|A|).

After finding indices it's easy to convert B to A, just by adding characters of A between 2 indices to B.

Edit: Also if there is no match this algorithm, works fine (there will be some characters in B which are not in A, from last index).

edited Apr 3, 2011 at 13:29

answered Apr 3, 2011 at 13:23

Saeed Amiri

22.6k5 gold badges49 silver badges86 bronze badges

1 Comment

dcn Over a year ago

Why would this greedy solution work optimally? The first occurence might not be the 'correct' one.

BugFinder · Accepted Answer · 2011-04-03 12:58:57Z

0

I would think you could determine if its possible with a nice reg expression. as there may or may not be 1 or more characters between any of the ones suggested in B.

answered Apr 3, 2011 at 12:58

BugFinder

17.9k4 gold badges42 silver badges58 bronze badges

7 Comments

Klark Over a year ago

I am not sure this can be easily done with regexs, and I am sure that it would be very slow solution. Thanks for the answer :).

digEmAll Over a year ago

@Klark: actually I think a regexp is pretty good to just check if it is possible to get string A from B. e.g. if /.*is.*t.*a.*sim.*te.*/ has a match, it's possible... Then, yeah to find what chars you need it's another story

BugFinder Over a year ago

Sure a regex would work, for example (.*)? between each character will then give you a search, for 0 or more characters between each character. So it would find that sme would match some and someone

Saeed Amiri Over a year ago

@BugFinder, thats brute force.

BugFinder Over a year ago

No, in most peoples eyes brute force would be to iterate through the sentence and step by step check if matches and what characters would be needed to go in. This is pattern matching, and is a quick "does it or doest it or not fit the pattern".. as digEmAll says though, finding the missing characters would be harder, although of course the reg expression would give you the missing peices.

|

John Sobolewski · Accepted Answer · 2011-04-03 13:09:14Z

0

I actually think brute force will be the quickest but you have to mutate string A to do it fastests in one pass... just iterate over A change each letter to "-" that doesn't match the letter you are looking for unless it is a space type of deal... that will be constant time and require no extra storage... basically your time will be N. Of course you have to deal with the tokens from "B" so once you find "i" you need to make sure the next letter is "s" so that you find "is" but is still should be quick just don't go back to the I position if the next letter isn't s... that should point you in the correct direction without giving you the solution to your homework...

answered Apr 3, 2011 at 13:09

John Sobolewski

4,5621 gold badge23 silver badges27 bronze badges

Comments

Frerich Raabe · Accepted Answer · 2011-04-03 13:24:16Z

0

I think you could do it in linear time and space, like this (pseudocode, completely untested):

String f( String a, String b ) {
  String result;

  // Iterate over 'a' and 'b' in parallel. If both have the same
  // character, add it to the result. Otherwise, if 'a' has a space add a space to the result, otherwise add a dash.
  int idxB = 0;
  for (int idxA = 0; idxA < a.length(); ++idxA ) {
    if (a[idxA] == b[idxB]) {
      result.append(a[idxA]);
      ++idxB;
    } else if (a[idxA] == ' ') {
      result.append(' ');
    } else {
      result.append('-');
    }
  }

  // Tack on dashes to the result until it's as long as 'a'.
  while (result.length() < a.length()) {
    result.append('-');
  }

  return result;
}

edited Apr 3, 2011 at 13:24

answered Apr 3, 2011 at 13:09

Frerich Raabe

95.2k20 gold badges127 silver badges216 bronze badges

5 Comments

John Sobolewski Over a year ago

this example doesn't quite work as if the first string was "Think this is..." would give improper results... also there isn't confirmation this isn't homework so maybe giving a solution in code was a bad idea? Also would replace the spaces with hyphens so that would be bad too...

Frerich Raabe Over a year ago

@jsobo: I just tested a C++ version of this algorithm, and it worked fine (except that spaces became hyphens, but well - I wrote the algorithm from the top of my head and it was untested :-).

John Sobolewski Over a year ago

Try string A = "think I This is just a simple test" with the string B up there... you will get "--i-- - ---s... " which isn't correct

Svante Over a year ago

@jsobo: If that is not correct, then you seem to have additional requirements that are not mentioned in the question.

John Sobolewski Over a year ago

@Svante B having "IS" implies that the I and the S are next to each other. Your algorithm doesn't not ensure that will happen. Maybe the question doesn't explicitly state that requirement but it is so strongly implied I think we can safely infer it.

Collectives™ on Stack Overflow

string algorithm homework/interview like question

5 Answers 5

1 Comment

1 Comment

7 Comments

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

7 Comments

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related