0

The input is a string and the output is a list, each cell contains the corresponding word. Word is defined to be a sequence of letters and/or numbers. For example, Ilove is a word, 45tgfd is a word, 54fss. isn't a word because it has ..

Let us assume that commas come only after a word.

For example - 'Donald John Trump, born June 14, 1946, is the 45th' should become ['Donald', 'John', 'Trump', 'born', 'June', '14', '1946', 'is', 'the', '45th']

Tried doing it with [x.rstrip(',') for x in line.split() if x.rstrip(',').isalpha() or x.rstrip(',').isdigit()] when line is the original string, however it became messy and wrong - couldn't detect '45th' because of isdigit and isalpha.

any idea?

1
  • For dealing with "54fss.", is the expected result that you ignore the whole thing, or just ignore the period on the end? In other words, is it only commas that get treated specially? Commented Apr 27, 2017 at 23:11

2 Answers 2

2

You are looking for str.isalnum:

>>> [x for x in (s.rstrip(',') for s in line.split()) if x.isalnum()]
['Donald', 'John', 'Trump', 'born', 'June', '14', '1946', 'is', 'the', '45th']
>>>

Notice, too, I'm not redundantly calling rstrip by using a generator expression inside the comprehension, this also let's me do only single pass on line.split().

Sign up to request clarification or add additional context in comments.

2 Comments

New one to me, +1
@bernie yeah, Python string methods have a lot of hidden gems.
1
>>> import re

>>> s = 'Donald John Trump, born June 14, 1946, is the 45th'
>>> [i.strip(',') for i in re.split(r'\s+',s) if not re.search(r'^[\.]|\w+\.\w+|[\.]$',i)]
['Donald', 'Trump', 'born', 'June', '14', '1946', 'is', 'the', '45th']

>>> s2 = 'tes.t .test test. another word'
>>> [i.strip(',') for i in re.split(r'\s+',s2) if not re.search(r'^[\.]|\w+\.\w+|[\.]$',i)]
['another', 'word']

3 Comments

Using re is likely the best approach.
"if not i.endswith('.')" is not sufficient here. Presumably OP wouldn't want words with a period in the middle of them either, and endswith won't pick that up. Plus it sounds like he would want to ignore words ending with other non-alphanumeric characters besides periods.
@Sweater-Baron: point taken. Updated answer to account for periods. If he wants to ignore other non-alphanumeric characters he can edit the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.