7

I have a list of strings s as follows:

s = ['Hello', 'world', '!', 'How', 'are', 'you', '?', 'Have', 'a', 'good', 'day', '.']

I want this list to be split into sublists. Whenever there's a ?!.\n a new sublist is formed as follows:

final = [['Hello', 'world', '!'],
         ['How', 'are', 'you', '?'],
         ['Have', 'a', 'good', 'day', '.']]

I tried this:

x = 0
for i in range(len(s)):
    if s[i] in ('!','?','.','\n'):
         final = s[x: x+i]
    x = i+1

final stores my output. Not getting the way it should be. Any suggestions?

5 Answers 5

2

You were not that far away:

x=0
final=[]
for i in range(len(s)):
    if s[i] in ('!','?','.','\n'):
        final.append(s[x:i+1])
        x=i+1

Only a bit of indexing problem and making final a list to collect all partial lists.

Sign up to request clarification or add additional context in comments.

2 Comments

I recommend the use of enumerate to get the index instead, and in general is better to use a set for membership testing because of its constant time vs the linear search in tuples or list
@Copperfield: Indeed enumerate is nicer for indexing, but would change the structure of the code. I wanted to stay as close as possible to the original structure.
1

You could use the following:

s = ['Hello', 'world', '!', 'How', 'are', 'you', '?', 'Have', 'a', 'good', 'day', '.']
letters = ['!', '?', '.']

idxes = [idx for idx, val in enumerate(s) if val in letters]
idxes = [-1] + idxes
answer = [s[idxes[i]+1:idxes[i+1]+1] for i in range(len(idxes[:-1]))]
print(answer)

Output

[['Hello', 'world', '!'], ['How', 'are', 'you', '?'], ['Have', 'a', 'good', 'day', '.']]

This uses a list comprehension with the built in enumerate function to extract the idxes of s where a punctuation mark occurs. It then uses another list comprehension to construct a list of sublists by slicing the s using the values of idxes.

Comments

1
s = ['Hello', 'world', '!', 'How', 'are', 'you', '?', 'Have', 'a', 'good', 'day', '.']
final = []
b = []
for x in s:
    b.append(x)
    if x in ('.', '?', '!', '\n'):
        final.append(b)
        b = []

Comments

0

1 Let final is an empty array.

2 While loop is true when is not empty and index < len(s).

3 append to final array with 0 to postion+1 words

4 shrink your main string s.

5 incr the index value

final = []
i =0
while len(s) and i<len(s):
    if s[i] in ('!','?','.','\n'):
         final.append( s[:i+1])
         s  = s[i+1:]
    i +=1  
print(final)

Comments

0

I'm not really often use python, but in your case I think you can also try to create a generator from you initial list, so you don't have to store list of lists:

>>> from itertools import chain
>>> def func(s):
...     g = iter(s)
...     def inner_func(g):
...         for x in g:
...             yield x
...             if x in ('.', '?', '!', '\n'):
...                 break
...     while True:
...         try:
...             f = g.next()
...         except StopIteration:
...             break
...         else:
...             yield inner_func(chain([f], g))
>>> [[y for y in x] for x in func(s)]
[['Hello', 'world', '!'], ['How', 'are', 'you', '?'], ['Have', 'a', 'good', 'day', '.']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.