0

I have some HTML that contains multiple HTML comments, within each comment is a form. I am trying to use preg_replace to replace these comments and the forms within with a tag in the form [CONTACT_FORM_X] where X is the numeric ID of the form.

$str = 'blah blah blah <!-- CONTACT FORM START [CONTACT_FORM_1] -->some form goes here<!-- CONTACT FORM END 1 --> blah blah blah <!-- CONTACT FORM START [CONTACT_FORM_2] -->another form goes here<!-- CONTACT FORM END 2 -->';

$replace = preg_replace('/<!-- CONTACT FORM START \[CONTACT_FORM_\d\] -->.*<!-- CONTACT FORM END \d -->/', '[CONTACT_FORM_X]', $str);
echo $replace;

So:

<!-- CONTACT FORM START [CONTACT_FORM_1] -->some form goes here<!-- CONTACT FORM END 1 -->

Should be replaced entirely with [CONTACT_FORM_1]

And ..

<!-- CONTACT FORM START [CONTACT_FORM_2] --> another form goes here<!-- CONTACT FORM END 2 -->

Should be replaced entirely with [CONTACT_FORM_2]

If I run my code above I get:

blah blah blah [CONTACT_FORM_X]

So my questions are:

  1. How can I get the value of \d and then use this in place of where I currently use X in my preg_replace

  2. My code only seems to replace one of the forms rather than both occurrences. How can I adapt preg_replace to allow multiple replaces

0

2 Answers 2

1

The preg_replace will replace all occurrences (it is global). The .* is greedy though and is matching everything after the <!-- CONTACT FORM START \[CONTACT_FORM_(\d)\] until <!-- CONTACT FORM END \d -->. To capture a value use ().

So try:

.*?<!-- CONTACT FORM START \[CONTACT_FORM_(\d)\] -->.*?<!-- CONTACT FORM END \d -->

or if you want to be sure you are matching the same closing contact form use the backreference:

.*?<!-- CONTACT FORM START \[CONTACT_FORM_(\d)\] -->.*?<!-- CONTACT FORM END \1 -->

The leading .*? should be removed if the preceding content should be kept. It is unclear to me what the intent is with that bit. From the Should be replaced entirely with [CONTACT_FORM_2] I interpreted as that's the only content that should remain.

Regex demo: https://regex101.com/r/kS2nK6/1

PHP Usage:

<?php
$str = 'blah blah blah <!-- CONTACT FORM START [CONTACT_FORM_1] -->some form goes here<!-- CONTACT FORM END 1 --> blah blah blah <!-- CONTACT FORM START [CONTACT_FORM_2] -->another form goes here<!-- CONTACT FORM END 2 -->';

$replace = preg_replace('/.*?<!-- CONTACT FORM START \[CONTACT_FORM_(\d)\] -->.*?<!-- CONTACT FORM END \d -->/', '[CONTACT_FORM_$1]', $str);
echo $replace;

PHP Demo: https://eval.in/611232

Sign up to request clarification or add additional context in comments.

Comments

1

Change your pattern and your replacement string as below:

$pattern = '/<!-- CONTACT FORM START \[CONTACT_FORM_(\d+)\] -->.*<!-- CONTACT FORM END \1 -->/';
$replace = preg_replace($pattern, '[CONTACT_FORM_$1]', $str);

Live demo

How it works

  • Put in parentheses any text you want to reuse later. This is called a captured group. So I changed \d to (\d+) in your pattern (the + just allows double-digit+ numbers)
  • To refer back to the first captured group from within the pattern, use \1. Changing CONTACT FORM END \d to CONTACT FORM END \1 tells the regex engine that the string to replace stops as soon as you hit the END with the same number you saw in START. Without this, the engine replaces everything to the very last CONTACT FORM END. That's why you were getting just one replacement.
  • In the replacement string, use $1 to refer to the first captured group. That's why changing CONTACT_FORM_X to CONTACT_FORM_$1 places the right number in the replacement string.

2 Comments

@chris85 you're right. I have. The Should be replaced entirely references a substring that doesn't include blah blah blah (check the OP again)
Oh, unclear in that case. I'll add note to my answer as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.