2

guys below is my code which I created with the help of Pandas library in Python:

import pandas as pd
df = pd.DataFrame({'Col1':['r0','X Y Z','A D','B','r1','r0','Y Z X','D','r1','r0','X','G','H','Z','r1']})

I want to create a list from the elements of the data frame. This list must be split to the internal list of the group elements which are between r0 and r1 as in below:

[['r0','X','Y','Z','A','D','B','r1'],
 ['r0','Y','Z','X','D','r1'],
 ['r0','X','G','H','Z','r1']]

My problem is I can do this with multiple loops. However, this way is not suitable for my code. I would like to know what is the easiest way to solve this problem. Thank you for reading.

3 Answers 3

2

If you're okay with a list of arrays, you could use str.split + stack + np.split:

df.Col1.str.split(expand=True).stack().values
y = np.split(x, np.flatnonzero(x == 'r0'))[1:]
y 

[array(['r0', 'X', 'Y', 'Z', 'A', 'D', 'B', 'r1'], dtype=object),
 array(['r0', 'Y', 'Z', 'X', 'D', 'r1'], dtype=object),
 array(['r0', 'X', 'G', 'H', 'Z', 'r1'], dtype=object)]

The reason I call [1:] is because, since your column starts with r0, np.split returns an empty array as the first split which I drop. If this is not the case, you can remove it.


As an aside, converting your result to a list of lists is extremely simple using map:

y = list(map(np.ndarray.tolist, y))
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much today you saved my whole week. Thanks again :-)
@Agyol No problem, if you used this answer (it sounds like you did, but I don't know), you should tick this one.
1
import operator
import functools

df1=df.Col1.str.split(' ').groupby(df.Col1.eq('r0').cumsum()).apply(list).apply(lambda x : functools.reduce(operator.concat, x))
Out[636]: 
df1
Col1
1    [r0, X, Y, Z, A, D, B, r1]
2          [r0, Y, Z, X, D, r1]
3          [r0, X, G, H, Z, r1]
Name: Col1, dtype: object

df1.values
Out[639]: 
array([['r0', 'X', 'Y', 'Z', 'A', 'D', 'B', 'r1'],
       ['r0', 'Y', 'Z', 'X', 'D', 'r1'], ['r0', 'X', 'G', 'H', 'Z', 'r1']], dtype=object)

Comments

0

You can reshape the underlying values array into chunks of length 5:

In [11]: df.Col1.values.reshape(-1, 5)
Out[11]:
array([['r0', 'X Y Z', 'A D', 'B', 'r1'],
       ['r0', 'Y Z X', 'D', 'r1', 'r0'],
       ['X', 'G', 'H', 'Z', 'r1']], dtype=object)

Then you can use a join/split list comprehension to split:

In [12]: [" ".join(row).split() for row in df.Col1.values.reshape(-1, 5)]
Out[12]:
[['r0', 'X', 'Y', 'Z', 'A', 'D', 'B', 'r1'],
 ['r0', 'Y', 'Z', 'X', 'D', 'r1', 'r0'],
 ['X', 'G', 'H', 'Z', 'r1']]

3 Comments

Last array missing an r0? ;-o
@Andy thanks for the response. But in the last, it will not take r0
@cᴏʟᴅsᴘᴇᴇᴅ hmmm, yes missaw the pattern :/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.