1

I am using a regex to replace all email addresses in a string with a nice <a> to make them clickable. This works perfect, except for the case when there are two words of a certain minimum length and a dash between them in front of the email address. Only then I get an empty string as result.

<?php

$search = '#(^|[ \n\r\t])(([a-z0-9\-_]+(\.?))+@([a-z0-9\-]+(\.?))+[a-z]{2,5})#si';
$replace = '\\1<a href="mailto:\\2">\\2</a>';

$string = "tttteeee-sssstttt [email protected]";
echo preg_replace($search, $replace, $string);
// Output: "" (empty)

$string = "te-st [email protected]";
echo preg_replace($search, $replace, $string);
// Output: "te-st <a href="mailto:[email protected]">[email protected]</a>" (as expected)

$string = "[email protected] tttteeee-sssstttt";
echo preg_replace($search, $replace, $string);
// Output: "<a href="mailto:[email protected]">[email protected]</a> tttteeee-sssstttt" (as expected)

?>

I have tried everything, but I really can't find the problem. A solution would be removing the first dash in the regex (before the @ sign), but that way email addresses with a dash before the @ wouldn't be highlighted.

3
  • 1
    Before my brain is going to parse your regex: you do know {1,} == + right? Commented Jul 17, 2011 at 22:50
  • Yup. I got this regex online, so it's not my own style. I'll fix it now, + is much clearer! Commented Jul 17, 2011 at 22:53
  • WTF? The string becomes NULL here.... Commented Jul 17, 2011 at 22:57

2 Answers 2

2

OK, minimum use case: #([a-z-]+\.?)+@#, which reaches the backtrack limit (use preg_last_error()), it cannot determine where to put things, as the \. is optional, determining whether to use the inside or the outside + is a lot of work. The default limit of pcre.backtrack_limit of 100000 does not work, setting it to 1000000 does.

To solve this, make it easier on the parser: the first (([a-z0-9\-_]+(\.?))+ should become: ([a-z0-9\-_]+(\.[a-z0-9\-_]+)*), which is a lot easier to solve internally. And as a bonus, instead of the accepted answer, this still doesn't allow consecutive dots.

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks Wrikken. Good solution. I was surprised to note that (^|[ \r\n\t])(([A-Z0-9_\-]+\.?)+@([A-Z0-9_\-]+\.?)+\.[A-Z]{2,5})($|[ \r\n\t]) worked with eregi_replace which is deprecated. So the working solution in it's entirety is: #(^|\b)(([a-z0-9\-_]+(\.[a-z0-9\-_]+)*)@([a-z0-9\-_]+(\.[a-z0-9\-_]+)*)\.[A-Z]{2,5})($|\b)#i. @Jonathon, I think you should accept Wrikken's answer instead, as I had no idea why it wasn't working before and I think this answer is more helpful.
Nice addition of \b too, and actually, I think we can forgo the ^ & $ in that case, starting & ending it with just \b instead of (^|\b) & ($|\b). And .museum is 6 characters at the moment, as long as ICANN isn't yet handing our TLD's (they start next year with custom ones, I do not know whether they limit the lenght), having \.[A-Z]{2,6} should temporarily do the trick.
And just to be a bitch about it: how would you mail the good people at http://пример.испытание/?
Good point, and thanks for pointing that out. So we'd end up with $search = '#\b((([a-z0-9\-_]+(\.[a-z0-9\-_]+)*)@([a-z0-9\-_]+(\.[a-z0-9\-_]+)*)\.[A-Z]{2,6}))\b#i'; But because of groups changing around $replace should also be change to: $replace = '<a href="mailto:\\1">\\1</a>';
Lol, I don't even want to go there. People will probably stick mostly to ascii for their email addresses and domain names for a few more years... I hope.
|
1

Try using this for your search string instead:

$search = '#(^|\b)([A-Z0-9_\-.]+@[A-Z0-9_\-.]+\.[A-Z]{2,5})($|\b)#i';

3 Comments

Works great! Could you explain what was wrong with my search regex?
Hey, I just made a change to account for the fact that email addresses can contain dots.
You made the change that they can contain consecutive dots.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.