1

I have a txt file with various email addresses and other lines that are not valid emails, I am trying to print only the valid email addresses, when I use the code below, nothing is printed. This is the content of the txt file:

[email protected]   
[email protected]

lalalalal

In this case, only both the email addresses should be printed

 import re

    my_file = open('emails.txt', 'r+')
5
  • Add re.M flag, re.findall(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$", my_file.read(), re.M) Commented Mar 5, 2019 at 11:58
  • It is very similar to this question: stackoverflow.com/q/6186938/4636715 except you specifically look for email addresses. But as your point is not the regex you've built, it can be considered as a dupe. Commented Mar 5, 2019 at 12:02
  • @vahdet It is not similar to that question. Here, the whole line must match a pattern. Commented Mar 5, 2019 at 12:03
  • nothing is printed because the for loop is iterating over the file, which has already seeked to the end with .read(). why aren't you iterating over items instead? Commented Mar 5, 2019 at 12:06
  • You are looking for matches, storing them in items and in the very next line your are overwriting items. Commented Mar 5, 2019 at 12:08

3 Answers 3

1

You may fix your code if you add re.M flag:

re.findall(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$", my_file.read(), re.M)

Since you read in the whole file with my_file.read(), the ^ and $ should match start/end of the line, not string, and the re.M flag does that.

Also, you may read the file line by line and only get those lines that fully match your pattern:

items = []
email_rx = re.compile(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$")
with open('emails.txt', 'r+') as my_file:
    for line in my_file:
        if email_rx.match(line):
            items.append(line)

Note that only $ anchor is necessary as re.match only anchors matches at the start of the string.

Note that you may have CRLF endings, then, you might either rstrip each line before testing against regex and appending to items, or add \s* pattern at the end before $ anchor.

Sign up to request clarification or add additional context in comments.

Comments

0
import re
my_file = open('emails.txt', 'r+')
items = re.findall(r"([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", my_file.read())
for items in items:
    print(items)

Two problems

  1. for item in items instead of file
  2. remove ^ and $ from your pattern.

1 Comment

If you remove the anchors, the email like substrings that do not equal the whole line will get extracted, too. OP used the anchors for a reason.
0

This should print all emails in the file

import re
reg = '[A-Za-z0-9.]+@[A-Za-z0-9]+[.][a-z]+'
with open('email.txt', 'r') as f1:
    for email in f1:
        if(len(re.findall(reg,email))!=0):
            print(email)

And this should get only whole line emails -

import re
reg = '[A-Za-z0-9.]+@[A-Za-z0-9]+[.][a-z]+'
with open('email.txt', 'r') as f1:
    for email in f1:
        if(len(re.findall(reg,email))):
            if(len(re.findall(reg,email)[0])==len((email.replace("\n","")))):
                print(email)

3 Comments

OP only wants those emails that are equal to whole lines.
Check the second part of code that willget the lines with only whole email in it.
There is a more straight-forward approach, see my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.