1

I need a fairly complex regex to accomplish the following:

> replace numbers in a string, i.e. 700, 12.43 by a label (format: {NUMBER:xx})
> ignore: when number is between {braces}, i.e. {7}, {7th}
> ignore: when any character is attached to number, i.e. G3, 7x, 1/2
> except: when
          > preceded by $, i.e. $840
          > followed by .!?:, i.e. 33! 45.65?  4...

Taken all together:

Buy 4 {5} G3 Mac computers for 80% at $600 or 2 for 1/2 price: 200... 
dollar. Twice - 2x - as cheap!

Desired output:

Buy {NUMBER:4} {5} G3 Mac computers for 80% at 
$ {NUMBER:600} or {NUMBER:2} for 1/2 price: 
{$NUMBER:200} dollar. Twice - 2x - as cheap!

I now have this:

preg_replace("/(?<!{)(?>[0-9]+(?:\.[0-9]+)?)(?!})/", " {NUMBER:$0} ", $string);

which outputs:

Buy {NUMBER:4} {5} G {NUMBER:3} Mac computers for {NUMBER:80} % at 
$ {NUMBER:600} or {NUMBER:2} for {NUMBER:1} / {NUMBER:2} price: 
{NUMBER:200} ... dollar. Twice - {NUMBER:2} x - as cheap!

In other words: ignoring exceptions aren't working yet, and I don't know how to properly implement it. Who does and can help me out?

2 Answers 2

2

This works for your test cases and follows your rules, assuming that braces are correctly matched and unnested:

$result = preg_replace(
    '/(?<!\{)        # Assert no preceding {
    (?<![^\s$])      # Assert no preceding non-whitespace except $
    \b               # Match start of number
    (\d+(?:\.\d+)?+) # Match number (optional decimal part)
    \b               # Match end of number
    (?![^{}]*\})     # Assert that next brace is not a closing brace
    (?![^\s.!?,])    # Assert no following non-whitespace except .!?,
    /x', 
    '{NUMBER:\1}', $string);
Sign up to request clarification or add additional context in comments.

2 Comments

@Reveller - I added a small correction: I made the optional decimal part match possessively ((...)?+), or the regex would have matched the 1 in 1.5x which you probably wouldn't have wanted, or would you?
Tim, you are correct :) That would not have been what I wanted. thanks!
1
$string="Buy 4 {5} G3 Mac computers for 80% at \$600 or 2 for 1/2 price: 200... \ndollar. Twice - 2x - as cheap!";
$pattern='/[\s|^|\$]([0-9]+(\.\s+)*)[\s|$|\.|\!|\?|\:|\,]/';
//$count=preg_match_all($pattern, $string, $matches);
//echo "$count\n";
//print_r($matches[1]);
echo preg_replace($pattern,"{NUMBER:\$1}",$string);

1 Comment

It should be '/[\s|^|\$](\d+(?:\.\d+)?)[\s|$|\.|\!|\?|\:|\,]/' to catch decimals as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.