0

I have this regex expression:

$str = preg_replace_callback('@((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)@', function

($matches){return en($matches[1]);}, $str);

Against this html code:

<iframe src="//example.com/hello.php"></iframe>

the output is:

 <iframe src="//maskedurl.php?l=kdsdhkhdkshdkhsdskhd"></iframe>

How can i have the regex not output the 2 slashes, // ? and have it only display:

<iframe src="maskedurl.php?l=kdsdhkhdkshdkhsdskhd"></iframe>

yes I know it's missing the http: but that is out of my control

4
  • Expect regex not ideal for HTML parsing admonishments. Commented Oct 12, 2015 at 15:53
  • even then, this regex101 shows that your group is capturing that you expect to capture. Commented Oct 12, 2015 at 15:56
  • @dustmouse He's not using the regexp to parse HTML, he's using it to parse a URL. Commented Oct 12, 2015 at 15:56
  • 2 options: (re)move the outer brackets, because there // is being grouped, which you don't want. Alternatively: return a substr from within the callback (ie substr($matches[1], 2)) Commented Oct 12, 2015 at 16:01

1 Answer 1

2

You have to modify the (https?://) part, so that it accepts also the //. This could result in something easy as ((?:https?:)?//) or (https?://|//)

So in the end you have the following regex

'@((https?://|//)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)@'

php

$str = preg_replace_callback('@((https?://|//)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)@', function ($matches){return en($matches[0]);}, $str);
Sign up to request clarification or add additional context in comments.

4 Comments

OP isn't worried about the http(s) not being there, he's worried about the callback returning the first two characters (ie //)
Then he should use $matches[0]
No, read the question carefully: he doesn't want to return the first two chars: he needs to either not capture the first part of the match, or return substr($match[1], 2);
His problem is, he want to mask the url, but the first two // are ignored and never get replaces by his regex. So he would also the first two // in his regex. If the protocol should be ignored, he could/have to do this in his callback function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.