1

Input: ball ball code

Output should be: ball code

Input: awycodeawy

Output should be: awycode

I tried these, but didn't work:

$q = preg_replace("/\s(\w+\s)\1/i", "$1", $q);
$q = preg_replace("/s(w+s)1/i", "$1", $q);
4
  • 1
    What should aabaaabaabaaa become? Commented May 18, 2012 at 15:20
  • regex does not retain state, right? so why would this be a pure regex solution? somebody? Commented May 18, 2012 at 15:20
  • @Kristian: They can. To some extend. In particular anything that just repeats or follows a pattern is well suited for pattern matching. regular-expressions.info/brackets.html Commented May 18, 2012 at 15:23
  • @Mark Byers "aabaaabaabaaa" should be "aababaaba". 2 or more chars is considered. Commented May 18, 2012 at 15:33

2 Answers 2

5

Here is positive lookahead base attempt on regex based solution to OP's problem.

$arr = array('ball ball code', 'abcabc bde bde', 'awycodeawy');
foreach($arr as $str)
   echo "'$str' => '" . preg_replace('/(\w{2,})(?=.*?\\1)\W*/', '', $str) ."'\n";

OUTPUT

'ball ball code' => 'ball code'
'abcabc bde bde' => 'abc bde'
'awycodeawy' => 'codeawy'

As you can for the input 'awycodeawy' it makes it to 'codeawy' instead of 'awycode'. The reason is that it is possible to find a variable length lookahead something which is not possible for lookbehind.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you. May I ask how much system resources does this code use? Is it make sense it to use for a website that run this code 3 time per minute. (Forgive me for being a noob :))
@noarm: In the present form this code is not resource intensive however I think resource usage depends on the length of the input string. If your input string is huge then it can take a while to scan through all the string to find duplicate.
it is not recommended to use it for testing against user sentences. This could harm the original string.
@JuniorMayhé: Original string is just echoed here not being updated.
My mistake I meant if someone try to update the original string, which is a sentence, it would mess with the original. I see it is being echoed ;-)
4
$q = preg_replace("/\b(\w+)\s+\\1\b/i", "$1", $q);

6 Comments

"abcabc bde bde" becomes "abcabc bde". Can't it be "abc bde"?
@noarm You said you wanted repeated words, not repeated parts of words. If you want it to apply to parts, however, change the \s+ in the middle to a \s*.
@anubhava It was never intended to. Trying to match occurrences later on like that is nearly impossible to do well with regex (without breaking other things, like making book into bok, or mississippi into missippi).
@Amber: Yes, that's the whole point. The way problem was stated by OP cannot be solved by a pure regex solution.
The OP gave some examples of what he wants. This answer doesn't work for the second example he gave: awycodeawy -> awycode.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.