1

I have strings in the following format name1 <[email protected]>. How can I use regex to pull only the name1 part out? Also, how might I be able to do this if I had multiple such names and emails, say name1 <[email protected]>, name2 <[email protected]>?

2
  • 1
    Are the emails actually surrounded by < >? Commented Feb 25, 2021 at 18:16
  • @DeepSpace yes, they are. Commented Feb 25, 2021 at 18:21

3 Answers 3

3

Try using split:

In [164]: s = 'name1 <[email protected]>, name2 <[email protected]>'
In [166]: [i.split()[0] for i in s.split(',')]
Out[166]: ['name1', 'name2']

If you have just one name:

In [161]: s = 'name1 <[email protected]>'
In [163]: s.split()[0]
Out[163]: 'name1'
Sign up to request clarification or add additional context in comments.

7 Comments

I thought to do this too, but it only works if there are not additional spaces, which I doubt is guaranteed; indeed a regex about < would be much cleaner!
I don't think the emails are surrounded by <>. Its just a way of representing.
@ti7 str.split will behave the same if there are multiple spaces. 'name1 <someemail>'.split() returns the same output as 'name1 <someemail>'.split()
Yes, str.split by default handles whitespaces.
obviously, but it will not work if the structure is like first, last <[email protected]> (which "real" emails frequently are) - the OP did not state this, but it'll be the case for any real-world collection
|
2

You can start with (\w+)\s<.*?>(?:,\s)? (see on regex101.com), which relies on the fact that emails are surrounded by < >, and customize it as you see fit.

Note that this regex does not specifically look for emails, just for text surrounded by < >.

Don't fall down the rabbit hole of trying to specifically match emails.

import re

regex = re.compile(r'(\w+)\s<.*?>(?:,\s)?')
string = 'name1 <[email protected]>, name2 <[email protected]>'

print([match for match in regex.findall(string)])

outputs

['name1', 'name2']

1 Comment

This actually works better for me, thank you.
2
import re

name = re.search(r'(?<! <)\w+', 'name1 <[email protected]>')

print(name.group(0))

>>> name1

Explanation:

(?<!...) is called a negative lookbehind assertion. I added ' <' into the ... as you are looking for the string that precedes the '<' of the email.

re.search(r'(?<!...), string_to_search)

https://docs.python.org/3/library/re.html


Edit/Forgot:

To search strings with multiple:

import re

regex = r"\w+([?<! <])"

multi_name = "name1 <[email protected]>, name2 <[email protected]>"
    
matches = re.finditer(regex, multi_name, re.MULTILINE)
    
for group, match in enumerate(matches, start=1):
    print(f"Match: {match.group()}")

>>> name1
>>> name2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.