Python pandas generate list from dataframe

Question

guys below is my code which I created with the help of Pandas library in Python:

import pandas as pd
df = pd.DataFrame({'Col1':['r0','X Y Z','A D','B','r1','r0','Y Z X','D','r1','r0','X','G','H','Z','r1']})

I want to create a list from the elements of the data frame. This list must be split to the internal list of the group elements which are between r0 and r1 as in below:

[['r0','X','Y','Z','A','D','B','r1'],
 ['r0','Y','Z','X','D','r1'],
 ['r0','X','G','H','Z','r1']]

My problem is I can do this with multiple loops. However, this way is not suitable for my code. I would like to know what is the easiest way to solve this problem. Thank you for reading.

cs95 · Accepted Answer · 2017-10-26 21:07:11Z

2

If you're okay with a list of arrays, you could use str.split + stack + np.split:

df.Col1.str.split(expand=True).stack().values
y = np.split(x, np.flatnonzero(x == 'r0'))[1:]
y 

[array(['r0', 'X', 'Y', 'Z', 'A', 'D', 'B', 'r1'], dtype=object),
 array(['r0', 'Y', 'Z', 'X', 'D', 'r1'], dtype=object),
 array(['r0', 'X', 'G', 'H', 'Z', 'r1'], dtype=object)]

The reason I call [1:] is because, since your column starts with r0, np.split returns an empty array as the first split which I drop. If this is not the case, you can remove it.

As an aside, converting your result to a list of lists is extremely simple using map:

y = list(map(np.ndarray.tolist, y))

answered Oct 26, 2017 at 21:07

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dancer PhD Over a year ago

Thank you so much today you saved my whole week. Thanks again :-)

cs95 Over a year ago

@Agyol No problem, if you used this answer (it sounds like you did, but I don't know), you should tick this one.

BENY · Accepted Answer · 2017-10-26 21:08:34Z

1

import operator
import functools

df1=df.Col1.str.split(' ').groupby(df.Col1.eq('r0').cumsum()).apply(list).apply(lambda x : functools.reduce(operator.concat, x))
Out[636]: 
df1
Col1
1    [r0, X, Y, Z, A, D, B, r1]
2          [r0, Y, Z, X, D, r1]
3          [r0, X, G, H, Z, r1]
Name: Col1, dtype: object

df1.values
Out[639]: 
array([['r0', 'X', 'Y', 'Z', 'A', 'D', 'B', 'r1'],
       ['r0', 'Y', 'Z', 'X', 'D', 'r1'], ['r0', 'X', 'G', 'H', 'Z', 'r1']], dtype=object)

answered Oct 26, 2017 at 21:08

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

Andy Hayden · Accepted Answer · 2017-10-26 21:03:01Z

0

You can reshape the underlying values array into chunks of length 5:

In [11]: df.Col1.values.reshape(-1, 5)
Out[11]:
array([['r0', 'X Y Z', 'A D', 'B', 'r1'],
       ['r0', 'Y Z X', 'D', 'r1', 'r0'],
       ['X', 'G', 'H', 'Z', 'r1']], dtype=object)

Then you can use a join/split list comprehension to split:

In [12]: [" ".join(row).split() for row in df.Col1.values.reshape(-1, 5)]
Out[12]:
[['r0', 'X', 'Y', 'Z', 'A', 'D', 'B', 'r1'],
 ['r0', 'Y', 'Z', 'X', 'D', 'r1', 'r0'],
 ['X', 'G', 'H', 'Z', 'r1']]

answered Oct 26, 2017 at 21:03

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

3 Comments

cs95 Over a year ago

Last array missing an r0? ;-o

Dancer PhD Over a year ago

@Andy thanks for the response. But in the last, it will not take r0

Andy Hayden Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ hmmm, yes missaw the pattern :/

Collectives™ on Stack Overflow

Python pandas generate list from dataframe

3 Answers 3

2 Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related