Regular expression in python : findall

Question

re.findall(r'[\w]+@+[\w.]','blahh [email protected] yipee']

returns ['ggg@g']

Why doesn't it returns ['[email protected]'] or at least ['ggg@google']?

vks · Accepted Answer · 2015-02-27 05:05:57Z

2

\w+@+[\w.]+

         ^^

You have failed to add a quantifier.So it will get only one character after @.

It should be

`re.findall(r'[\w]+@+[\w.]+','blahh [email protected] yipee')`

Also if there can be only one @ you can remove the quantifier ahead of it to make it \w+@[\w.]+

Output:['[email protected]']

See Demo

Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]

Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]

edited Feb 27, 2015 at 5:05

answered Feb 27, 2015 at 4:53

vks

68.1k11 gold badges96 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ajay_t · Accepted Answer · 2015-02-27 05:03:18Z

0

Here in [\w]+@+[\w.], you are just checking for single character after @. That's why it just compare g after @ and stops. You must check the multiple occurrences of word after @ by using * or +.

*= Zero or more occurrences Ex. ggg@google,com, ggg@
+=One or more occurrences Ex ggg@g, [email protected]

answered Feb 27, 2015 at 5:03

ajay_t

2,3857 gold badges41 silver badges65 bronze badges

Comments

heemayl · Accepted Answer · 2015-02-27 08:22:53Z

re.findall(r'[\w]+@+[\w.]','blahh [email protected] yipee'), lets break it down:

At first [\w] will match any alphanumeric character so, it will match all the characters except spaces and "@".

Then [\w]+ will match one or more of the successive alphanumeric character so that leaves us with blahh, ggg, google, com and yipee.

Now [\w]+@ will match a "@" after the previously matches, but onlyggg has a "@" character immediately after it so only ggg@ is matched.

Again, [\w]+@+ will match "@" one or more time, as we have only one "@" after ggg so the previous match remains the same i.e. ggg@.

Next we have [\w]+@+[\w.] means that there can be a single alphanumeric character or a literal . after the match, ggg@ has g after it so its get selected making the match ggg@g.

So, finally we get [ggg@g] as the result.

To print ['[email protected]'] try this:

re.findall(r'\w+@\w+\.\w+','blahh [email protected] yipee')

Collectives™ on Stack Overflow

Regular expression in python : findall

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related