1

If I have a string s = "Name: John, Name: Abby, Name: Kate". How do I extract everything in between Name: and ,. So I'd want to have an array a = John, Abby, Kate

Thanks!

3 Answers 3

3

No need for a regex:

>>> s = "Name: John, Name: Abby, Name: Kate"
>>> [x[len('Name: '):] for x in s.split(', ')]
['John', 'Abby', 'Kate']

Or even:

>>> prefix = 'Name: '
>>> s[len(prefix):].split(', ' + prefix)
['John', 'Abby', 'Kate']

Now if you still think a regex is more appropriate:

>>> import re
>>> re.findall('Name:\s+([^,]*)', s)
['John', 'Abby', 'Kate']
Sign up to request clarification or add additional context in comments.

Comments

1

The interesting question is how you would choose among the many ways to do this in Python. The answer using "split" is nice if you're confident that the format will be exact. If you would like some protection from minor format changes, a regular expression might be useful. You should think through what parts of the format are most likely to be stable, and capture those in your regular expression, while leaving flexibility for the others. Here is an example that assumes that the names are alphabetic, and that the word "Name" and the colon are stable:

import re
s = "Name: John, Name: Abby, Name: Kate"
names = [i.group(1) for i in re.finditer("Name:\s+([A-Za-z]*)", s)]
print names

You might instead want to allow for hyphens or other characters inside a name; you can do so by changing the text inside [A-Za-z].

A good page about Python regular expressions with lots of examples is http://docs.python.org/howto/regex.html.

2 Comments

The list comprehension is exactly equivalent to re.findall("Name:\s+([A-Za-z]*)", s)
Good point. I considered using findall. I personally find myself using finditer more often, because the job is to go through and do something to each found element, so I chose to use finditer in the example, even though the list comprehension here is a bit weird.
1

Few more ways to do it

>>> s
'Name: John, Name: Abby, Name: Kate'

Method 1:

>>> [x.strip() for x in s.split("Name:")[1:]]
['John,', 'Abby,', 'Kate']

Method 2:

>>> [x.rsplit(":",1)[-1].strip() for x in s.split(",")]
['John', 'Abby', 'Kate']

Method 3:

>>> [x.strip() for x in re.findall(":([^,]*)",s)]
['John', 'Abby', 'Kate']

Method 4:

>>> [x.strip() for x in s.replace('Name:','').split(',')]
['John', 'Abby', 'Kate']

Also note, how I always consistently applied strip which makes sense if their can be multiple spaces between 'Name:' token and the actual Name.

Method 2 and 3 can be used in a more generalized way.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.