regex using python language

Question

I have a txt file with various email addresses and other lines that are not valid emails, I am trying to print only the valid email addresses, when I use the code below, nothing is printed. This is the content of the txt file:

[email protected]   
[email protected]

lalalalal

In this case, only both the email addresses should be printed

 import re

    my_file = open('emails.txt', 'r+')

Add re.M flag, re.findall(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$", my_file.read(), re.M) — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Mar 5, 2019 at 11:58
It is very similar to this question: stackoverflow.com/q/6186938/4636715 except you specifically look for email addresses. But as your point is not the regex you've built, it can be considered as a dupe. — vahdet
– vahdet, Commented Mar 5, 2019 at 12:02
@vahdet It is not similar to that question. Here, the whole line must match a pattern. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Mar 5, 2019 at 12:03
nothing is printed because the for loop is iterating over the file, which has already seeked to the end with .read(). why aren't you iterating over items instead? — user3089519
– user3089519, Commented Mar 5, 2019 at 12:06
You are looking for matches, storing them in items and in the very next line your are overwriting items. — Klaus D.
– Klaus D., Commented Mar 5, 2019 at 12:08

Wiktor Stribiżew · Accepted Answer · 2019-03-05 12:06:32Z

You may fix your code if you add re.M flag:

re.findall(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$", my_file.read(), re.M)

Since you read in the whole file with my_file.read(), the ^ and $ should match start/end of the line, not string, and the re.M flag does that.

Also, you may read the file line by line and only get those lines that fully match your pattern:

items = []
email_rx = re.compile(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$")
with open('emails.txt', 'r+') as my_file:
    for line in my_file:
        if email_rx.match(line):
            items.append(line)

Note that only $ anchor is necessary as re.match only anchors matches at the start of the string.

Note that you may have CRLF endings, then, you might either rstrip each line before testing against regex and appending to items, or add \s* pattern at the end before $ anchor.

thavan · Accepted Answer · 2019-03-05 12:11:25Z

0

import re
my_file = open('emails.txt', 'r+')
items = re.findall(r"([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", my_file.read())
for items in items:
    print(items)

Two problems

for item in items instead of file
remove ^ and $ from your pattern.

answered Mar 5, 2019 at 12:11

thavan

2,48924 silver badges32 bronze badges

1 Comment

Wiktor Stribiżew Over a year ago

If you remove the anchors, the email like substrings that do not equal the whole line will get extracted, too. OP used the anchors for a reason.

Shahir Ansari · Accepted Answer · 2019-03-06 05:33:00Z

0

This should print all emails in the file

import re
reg = '[A-Za-z0-9.]+@[A-Za-z0-9]+[.][a-z]+'
with open('email.txt', 'r') as f1:
    for email in f1:
        if(len(re.findall(reg,email))!=0):
            print(email)

And this should get only whole line emails -

import re
reg = '[A-Za-z0-9.]+@[A-Za-z0-9]+[.][a-z]+'
with open('email.txt', 'r') as f1:
    for email in f1:
        if(len(re.findall(reg,email))):
            if(len(re.findall(reg,email)[0])==len((email.replace("\n","")))):
                print(email)

edited Mar 6, 2019 at 5:33

answered Mar 5, 2019 at 12:40

Shahir Ansari

1,86818 silver badges21 bronze badges

3 Comments

Wiktor Stribiżew Over a year ago

OP only wants those emails that are equal to whole lines.

Shahir Ansari Over a year ago

Check the second part of code that willget the lines with only whole email in it.

Wiktor Stribiżew Over a year ago

There is a more straight-forward approach, see my answer.

Collectives™ on Stack Overflow

regex using python language

3 Answers 3

Comments

1 Comment

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related