0

I just made a regex pattern for replace links to HTML anchor tags, this is it:

~((http\:\/\/|https\:\/\/)([^ ]+)) ~

The reason why I ask this, is because I just finished this regex recently and made a few tests with some links, it works great but I want to be sure that there is no bugs with this pattern (I'm a regex newie) and maybe a regex expert could tell his opinion and / or suggestion.

By the way, if you're figuring out the space at the end, you may think it will not work if the string doesn't ends with a space, but my trick is to add that space to the string before the replacement and then remove it again once the stuff is done.

PD:

I don't take care of the link's validation itself, I just want to search for the strings that starts with http:// and ends with a space, nothing else, since link validation is a bit complicated.

EDIT:

Some of my code:

<?php

    $patron = "~(https?:\/\/[^\s]+) ~";
    //$patron = "~((http\:\/\/|https\:\/\/)([^ ]+)) ~";
    $reemplazar = '<a href="$1">$1</a> ';
    $cadena = "https://www.youtube.com/watch?v=7it5wioGixA ";

    echo preg_replace($patron, $reemplazar, $cadena);

?>
8
  • google search for 'regex tester' Commented Dec 16, 2013 at 22:57
  • @Donovan I did, and I used it, now I want to try 'StackOverflow experts tester' :-) Commented Dec 16, 2013 at 22:59
  • 1
    That's not really what this site is intended for, you don't actually have a question, you're looking for opinion. Commented Dec 16, 2013 at 23:00
  • Then, where should I ask this? and why it is not? still being a doubt, what if there is something wrong with it, and someone suggest to me to something better..? Commented Dec 16, 2013 at 23:02
  • Have a look at PHP's filter_var function - especially the FILTER_VALIDATE_URL option. Commented Dec 16, 2013 at 23:03

2 Answers 2

2

I think this can be greatly simplified:

~(https?://\S+) ~

Other than that: Looks okay to me.

Sign up to request clarification or add additional context in comments.

2 Comments

No. I'm marking the "s" in "https" as optional via the question mark. [^\s] simply means "all characters except any form of whitespaces."
And [^\s]+ can also be reduced to \S+
1

With the same idea, your pattern can be shorten to :

~https?://[^\s"'>]+~    # don't forget to escape the quote you use.

To change URLs to links:

$html = preg_replace_callback('~\b(?:(https?://)|www\.)[^]\s"\')<]++~',
    function ($m) {
        $pre = ($m[1]) ? $m[1] : 'http://'; 
        if (filter_var($pre . $m[0], FILTER_VALIDATE_URL))
            return '<a href="' . $m[0] . '">' . $m[0] . '</a>';
        else return $m[0];
    }, $html);

Old answer:

To change URLs inside links:

A better way to extract all href attributes from all "a" tags is to use the DOM.

$doc = new DOMDocument();
@$doc->loadHTML($htmlString);
$links = $doc->getElementsByTagName('href');
foreach($links as &$link) {
    $href = $link->getAttribute('href');
    $link->setAttribute('href', 'what you want');
}

3 Comments

I don't want to extract the href of anchors, I want to convert links to anchors..
@Neo: In this case it will more complicated since URLs don't have always a protocol!
the main requirement for the link is that it contains the protocol at least, it's very simple..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.