python string to list - list comprehension

Question

The input is a string and the output is a list, each cell contains the corresponding word. Word is defined to be a sequence of letters and/or numbers. For example, Ilove is a word, 45tgfd is a word, 54fss. isn't a word because it has ..

Let us assume that commas come only after a word.

For example - 'Donald John Trump, born June 14, 1946, is the 45th' should become ['Donald', 'John', 'Trump', 'born', 'June', '14', '1946', 'is', 'the', '45th']

Tried doing it with [x.rstrip(',') for x in line.split() if x.rstrip(',').isalpha() or x.rstrip(',').isdigit()] when line is the original string, however it became messy and wrong - couldn't detect '45th' because of isdigit and isalpha.

any idea?

For dealing with "54fss.", is the expected result that you ignore the whole thing, or just ignore the period on the end? In other words, is it only commas that get treated specially? — Alex von Brandenfels
– Alex von Brandenfels, Commented Apr 27, 2017 at 23:11

juanpa.arrivillaga · Accepted Answer · 2017-04-27 22:55:38Z

2

You are looking for str.isalnum:

>>> [x for x in (s.rstrip(',') for s in line.split()) if x.isalnum()]
['Donald', 'John', 'Trump', 'born', 'June', '14', '1946', 'is', 'the', '45th']
>>>

Notice, too, I'm not redundantly calling rstrip by using a generator expression inside the comprehension, this also let's me do only single pass on line.split().

edited Apr 27, 2017 at 22:55

answered Apr 27, 2017 at 22:53

juanpa.arrivillaga

97.6k14 gold badges141 silver badges190 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mechanical_meat Over a year ago

New one to me, +1

juanpa.arrivillaga Over a year ago

@bernie yeah, Python string methods have a lot of hidden gems.

mechanical_meat · Accepted Answer · 2017-04-27 23:38:01Z

1

>>> import re

>>> s = 'Donald John Trump, born June 14, 1946, is the 45th'
>>> [i.strip(',') for i in re.split(r'\s+',s) if not re.search(r'^[\.]|\w+\.\w+|[\.]$',i)]
['Donald', 'Trump', 'born', 'June', '14', '1946', 'is', 'the', '45th']

>>> s2 = 'tes.t .test test. another word'
>>> [i.strip(',') for i in re.split(r'\s+',s2) if not re.search(r'^[\.]|\w+\.\w+|[\.]$',i)]
['another', 'word']

edited Apr 27, 2017 at 23:38

answered Apr 27, 2017 at 22:51

mechanical_meat

170k25 gold badges237 silver badges231 bronze badges

3 Comments

juanpa.arrivillaga Over a year ago

Using re is likely the best approach.

Alex von Brandenfels Over a year ago

"if not i.endswith('.')" is not sufficient here. Presumably OP wouldn't want words with a period in the middle of them either, and endswith won't pick that up. Plus it sounds like he would want to ignore words ending with other non-alphanumeric characters besides periods.

mechanical_meat Over a year ago

@Sweater-Baron: point taken. Updated answer to account for periods. If he wants to ignore other non-alphanumeric characters he can edit the question.

Collectives™ on Stack Overflow

python string to list - list comprehension

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related