Regex in PHP not working

Question

My regex is:

$regex = '/(?<=Α: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/';

My content among others is:

Q: Email Address 
A: [email protected]

Rad Software Regular Expression Designer says that it should work.

Various online sites return the correct results.

If I remove the (?<=Α: ) lookbehind the regex returns all emails correctly.

When I run it from php it returns no matches.

What's going on?

I've also used the specific type of regex (ie (?<=Email: ) with different content. It works just fine in that case.

Which functions are you using to parse the regex? preg_, eregi_? — phpisuber01
– phpisuber01, Commented Apr 25, 2013 at 19:45
The A in your regular expression has some kind of diacritical on it, the A in the content is a normal letter. — Barmar
– Barmar, Commented Apr 25, 2013 at 20:00
@Barmar The diacritical doesn't display in my browsers... it looks like an A character. — AbsoluteƵERØ
– AbsoluteƵERØ, Commented Apr 25, 2013 at 20:30
@AbsoluteƵERØ It's not actually a diacritical. When I look at the letters side by side, this one is just taller. When I paste it into Emacs, it says it has code #x51371. — Barmar
– Barmar, Commented Apr 25, 2013 at 20:31

anubhava · Accepted Answer · 2013-04-25 20:06:35Z

1

You are not most likely not using DOTALL flag s here which will make DOT match newlines as well in your regex:

$str = <<< EOF
Q: Email Address 
A: [email protected]
EOF;
if (preg_match_all('/(?<=A: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/s', 
                   $str, $arr))
   print_r($arr);

OUTPUT:

Array
(
    [0] => Array
        (
            [0] => [email protected]
        )

    [1] => Array
        (
            [0] => [email protected]
        )

    [2] => Array
        (
            [0] => name
        )

    [3] => Array
        (
            [0] => example.
        )

    [4] => Array
        (
            [0] => com
        )

)

answered Apr 25, 2013 at 20:06

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

AbsoluteƵERØ · Accepted Answer · 2013-04-25 20:20:41Z

1

This is my newer monster script for verifying whether an e-mail "validates" or not. You can feed it strange things and break it, but in production this handles 99.99999999% of the problems I've encountered. A lot more false positives really from typos.

<?php

$pattern = '!^[^@\s]+@[^.@\s]+\.[^@\s]+$!';

$examples = array(
  '[email protected]',
  '[email protected]',
  '[email protected]',
  '[email protected]',
  'bad.email@google',
  '@google.com',
  'my@[email protected]',
  'my [email protected]',
);


foreach($examples as $test_mail){
    if(preg_match($pattern,$test_mail)){
      echo ("$test_mail - passes\n");   
    } else {
      echo ("$test_mail - fails\n");                
    }
}

?>

Output

[email protected] - passes
[email protected] - passes
[email protected] - passes
[email protected] - fails
bad.email@google - fails
@google.com - fails
my@[email protected] - fails
my [email protected] - fails

Unless there's a reason for the look-behind, you can match all of the emails in the string with preg_match_all(). Since you're working with a string, you would slightly modify the regex slightly:

$string_only_pattern = '!\s([^@\s]+@[^.@\s]+\.[^@\s]+)\s!s';

$mystring = '
[email protected] - passes
[email protected] - passes
[email protected] - passes
[email protected] - fails
bad.email@google - fails
@google.com - fails
my@[email protected] - fails
my [email protected] - fails
';

preg_match_all($string_only_pattern,$mystring,$matches);

print_r ($matches[1]);

Output from string only

Array
(
    [0] => [email protected]
    [1] => [email protected]
    [2] => [email protected]
    [3] => [email protected]
)

edited Apr 25, 2013 at 20:20

answered Apr 25, 2013 at 20:05

AbsoluteƵERØ

7,8792 gold badges27 silver badges35 bronze badges

2 Comments

Barmar Over a year ago

What does this have to do with the question, which is about the A: lookbehind?

AbsoluteƵERØ Over a year ago

@Barmar I was getting to it. Sorry for the delay... I only write 160wpm.

Barmar · Accepted Answer · 2013-04-25 20:12:33Z

0

The problem is that your regular expression contains Α, which has an accent over it, but the content contains A, which doesn't. So the lookbehind doesn't match.

I change the regex to:

$regex = '/(?<=A: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/';

and it works.

answered Apr 25, 2013 at 20:12

Barmar

789k57 gold badges554 silver badges669 bronze badges

1 Comment

jimmy Over a year ago

You have got to be kidding me.... An hour spent trying to find out what's wrong and in the end I'm looking for a Greek 'A' when I want an English one...

Community · Accepted Answer · 2017-05-23 10:30:16Z

0

Outside of your regex issue itself, you should really consider not trying to write your own e-mail address regex parser. See stackoverflow post: Using a regular expression to validate an email address on why -- upshot: the RFC is long and demanding on your regex abilities.

edited May 23, 2017 at 10:30

CommunityBot

11 silver badge

answered Apr 25, 2013 at 19:47

user559633

1 Comment

jimmy Over a year ago

Yes, I've seen the monster of a regex needed to correctly extract RFC email addresses!

Casimir et Hippolyte · Accepted Answer · 2013-04-25 20:39:52Z

0

The A char in your subject is the "normal" char with the code 65 (unicode or ascii). But The A you use in the lookbehind of your pattern have the code 913 (unicode). They look similar but are different.

answered Apr 25, 2013 at 20:39

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

Collectives™ on Stack Overflow

Regex in PHP not working

5 Answers 5

Comments

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

2 Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related