1

I have tried this reg. expression in order to retrieve an email address. As i have little experience with that, i would like to ask you if you know what's wrong with it, since it doubles one word:

regexp = "(\\w+)(\\(at\\))((\\w+\\.)+)([a-z]{2,3})";

Supposing i have an input "madrugada(at)yahoo.co.uk", it gives out as a result [email protected] .

pattern = Pattern.compile (regexp);
m = pattern.matcher (my_input);
while (m.find()) {
    for (int i=0; i<=m.groupCount(); i++)
         // it would give out: madrugada (at) yahoo co co uk
}

Thank you

3 Answers 3

3

You have an extra set of parentheses in your regex. When you loop through the capture groups, both of the capture groups (one of which is inside the other) are returned, duplicating the output since they captured the same thing.

Try this

regexp = "(\\w+)(\\(at\\))(\\w+\\.)+([a-z]{2,3})";

Edit: An alternate RegEx that uses non-capturing groups seems like it would solve the problem.

regexp = "(\\w+)(\\(at\\))((?:\\w+\\.)+)([a-z]{2,3})";
Sign up to request clarification or add additional context in comments.

5 Comments

now it outputs stg like: [email protected] ( so it only takes the last word before .uk )
I believe this one may also work: "\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"
Given the RegEx in the original question, it seems it might be better to pull only the matching groups that you know you want (1, 2, 3, and 5) and skip that inner capture group (4) that duplicates the final part of the non TLD portion of the email address.
thank you John, this is what i've done before but i wanted something more ellegant:)
@Madrugada I just had a flash of insight. If we convert the inner domain name capture group into a non-capturing group, that should solve the problem. Try this RegEx: "(\\w+)(\(at\))((?:\\w+\\.)+)([a-z]{2,3})";
1
import java.util.regex.*;
String a="madrugada(at)yahoo.co.in.ro.uk";
String regexp="(\\w+)(\\(at\\))(\\w+)((?:\\.\\w+)*)(\\.[a-z]{2,3})";
Pattern pattern = Pattern.compile (regexp);
Matcher m = pattern.matcher (a);
while (m.find()) {
    for (int i=0; i<=m.groupCount(); i++)
         println m.group(i);
}

produces following output:

madrugada(at)yahoo.co.in.ro.uk
madrugada
(at)
yahoo
.co.in.ro
.uk

EDIT:

Updated the above with a non capturing group. The reason that it did not work before is even though it matched multiple .\w+ patterns, the backreference was only to the last one. Also changed the non capturing group to * for accomodate madrugada(at)yahoo.uk

4 Comments

i am sorry, this is worse :( output: [email protected]
I've edited the regex in the post, I've made the first match exclude the . and the match words starts with . and it seems to have worked for me
it works indeed.. did you change the expression meantime, thanks a lot!
but for this it doesn't work: madrugada(at)yahoo.co.in.ro.uk . I don't get it...it seems correct :(
1

You also don't really want to include m.group(0), as it contains the whole segment that matched your overall RE.

for (int i=1;i<=m.groupCount();i++) {
  System.out.println(m.group(i));
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.