0

My regex is:

$regex = '/(?<=Α: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/';

My content among others is:

Q: Email Address 
A: [email protected]

Rad Software Regular Expression Designer says that it should work.

Various online sites return the correct results.

If I remove the (?<=Α: ) lookbehind the regex returns all emails correctly.

When I run it from php it returns no matches.

What's going on?

I've also used the specific type of regex (ie (?<=Email: ) with different content. It works just fine in that case.

6
  • 1
    Which functions are you using to parse the regex? preg_, eregi_? Commented Apr 25, 2013 at 19:45
  • @phpisuber01 preg_match(). Commented Apr 25, 2013 at 19:52
  • The A in your regular expression has some kind of diacritical on it, the A in the content is a normal letter. Commented Apr 25, 2013 at 20:00
  • @Barmar The diacritical doesn't display in my browsers... it looks like an A character. Commented Apr 25, 2013 at 20:30
  • @AbsoluteƵERØ It's not actually a diacritical. When I look at the letters side by side, this one is just taller. When I paste it into Emacs, it says it has code #x51371. Commented Apr 25, 2013 at 20:31

5 Answers 5

1

You are not most likely not using DOTALL flag s here which will make DOT match newlines as well in your regex:

$str = <<< EOF
Q: Email Address 
A: [email protected]
EOF;
if (preg_match_all('/(?<=A: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/s', 
                   $str, $arr))
   print_r($arr);

OUTPUT:

Array
(
    [0] => Array
        (
            [0] => [email protected]
        )

    [1] => Array
        (
            [0] => [email protected]
        )

    [2] => Array
        (
            [0] => name
        )

    [3] => Array
        (
            [0] => example.
        )

    [4] => Array
        (
            [0] => com
        )

)
Sign up to request clarification or add additional context in comments.

Comments

1

This is my newer monster script for verifying whether an e-mail "validates" or not. You can feed it strange things and break it, but in production this handles 99.99999999% of the problems I've encountered. A lot more false positives really from typos.

<?php

$pattern = '!^[^@\s]+@[^.@\s]+\.[^@\s]+$!';

$examples = array(
  '[email protected]',
  '[email protected]',
  '[email protected]',
  '[email protected]',
  'bad.email@google',
  '@google.com',
  'my@[email protected]',
  'my [email protected]',
);


foreach($examples as $test_mail){
    if(preg_match($pattern,$test_mail)){
      echo ("$test_mail - passes\n");   
    } else {
      echo ("$test_mail - fails\n");                
    }
}

?>

Output

  1. [email protected] - passes
  2. [email protected] - passes
  3. [email protected] - passes
  4. [email protected] - fails
  5. bad.email@google - fails
  6. @google.com - fails
  7. my@[email protected] - fails
  8. my [email protected] - fails

Unless there's a reason for the look-behind, you can match all of the emails in the string with preg_match_all(). Since you're working with a string, you would slightly modify the regex slightly:

$string_only_pattern = '!\s([^@\s]+@[^.@\s]+\.[^@\s]+)\s!s';

$mystring = '
[email protected] - passes
[email protected] - passes
[email protected] - passes
[email protected] - fails
bad.email@google - fails
@google.com - fails
my@[email protected] - fails
my [email protected] - fails
';

preg_match_all($string_only_pattern,$mystring,$matches);

print_r ($matches[1]);

Output from string only

Array
(
    [0] => [email protected]
    [1] => [email protected]
    [2] => [email protected]
    [3] => [email protected]
)

2 Comments

What does this have to do with the question, which is about the A: lookbehind?
@Barmar I was getting to it. Sorry for the delay... I only write 160wpm.
0

The problem is that your regular expression contains Α, which has an accent over it, but the content contains A, which doesn't. So the lookbehind doesn't match.

I change the regex to:

$regex = '/(?<=A: )(([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4}))/';

and it works.

1 Comment

You have got to be kidding me.... An hour spent trying to find out what's wrong and in the end I'm looking for a Greek 'A' when I want an English one...
0

Outside of your regex issue itself, you should really consider not trying to write your own e-mail address regex parser. See stackoverflow post: Using a regular expression to validate an email address on why -- upshot: the RFC is long and demanding on your regex abilities.

1 Comment

Yes, I've seen the monster of a regex needed to correctly extract RFC email addresses!
0

The A char in your subject is the "normal" char with the code 65 (unicode or ascii). But The A you use in the lookbehind of your pattern have the code 913 (unicode). They look similar but are different.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.