2

Here is the output of my code:

Tue Dec 17 04:34:03 +0000 2013,Email me for tickets email me at [email protected],1708824644
Tue Dec 17 04:33:58 +0000 2013,@musclepotential ok man. you can email [email protected],25016561

I want to find the email address in the ,<text>, (the text between the commas) and then reprint just the email.

Example:

Tue Dec 17 04:34:03 +0000 2013, [email protected],1708824644
Tue Dec 17 04:33:58 +0000 2013, [email protected],25016561

I know I can use the regex below to get just the email but then I loose the other data.

string = str(messages)
regex = "\w+@\w+\.com"
match = re.findall(regex,string)
2
  • What does the input look like? Commented Dec 17, 2013 at 4:52
  • 1
    I'm pretty sure that \w+ isn't good enough. what about [email protected]? Commented Dec 17, 2013 at 4:52

4 Answers 4

2

based on your examples
use this pattern ,.*?(\S+), Demo
this solution is independent of the email pattern as it is one of the most sought patterns and it could vary a lot such as [email protected]

Sign up to request clarification or add additional context in comments.

1 Comment

Note that this only works if the email address is between commas, and it captures the last word of ANYTHING between commas.
1

After your current code, try this:

new_string = string.split(',')
new_string[1] = match[0]
output_string = ', '.join(new_string)

Comments

1

This might work well...

string = str(messages)
regex = "(?<=,).*?(?=\S+,\d+$)"
ouput_str=re.sub(regex,"",string)

Comments

0

The answers above rely on your text being remarkably similar to your examples. This code is a little more agile, matching any number of emails in your text. I did not thoroughly document it, but...

harvest_emails takes a string of line-separated strings, each of those comma-separated as in your examples, date,message_string,identifier, and returns a generator that produces a 3-length tuple (date,comma-sep-emails,identifier). It will pull any number of emails from the text and matches any email of the form [email protected] | [email protected] | [email protected] where x is any non-zero length series of non-whitespace characters.

def harvest_emails(target):
    """"Takes string, splits it on \n, then yields each line formatted as:
datecode, email, identifier
"""
    import re

    for line in target.splitlines():
        t = line.split(",")
        yield (
            t[0].strip(),
            ','.join(
                re.findall("\S+@\S+\.(?:com|org|net)",
                           ''.join(t[1:-1]).strip(),re.I)[0:]),
            t[-1].strip())

.

>>>messages = """04:34:03 +0000 2013,Email me for tickets email me at [email protected],1708824644
Tue Dec 17 04:33:58 +0000 2013,@musclepotential ok, man. you can email [email protected],25016561
Tue Dec 17 04:34:03 +0000 2013, [email protected], [email protected],1708824644
Tue Dec 17 04:33:58 +0000 2013, [email protected],25016561"""
>>>data = list()
>>>for line in harvest_emails(messages):
        d = dict()
        d["date"],d["emails"],d["id"] = line[0],line[1].split(','),line[2]
        data.append(d)
>>>for value in data:
        print(value)
{'emails': ['[email protected]'], 'date': '04:34:03 +0000 2013', 'id': '1708824644'}
{'emails': ['[email protected]'], 'date': 'Tue Dec 17 04:33:58 +0000 2013', 'id': '25016561'}
{'emails': ['[email protected]', '[email protected]'], 'date': 'Tue Dec 17 04:34:03 +0000 2013', 'id': '1708824644'}
{'emails': ['[email protected]'], 'date': 'Tue Dec 17 04:33:58 +0000 2013', 'id': '25016561'}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.