Combine Pandas columns into a nested list

Question

I am attempting to combine elements of a dataframe into a nested list. Say I have the following:

df = pd.DataFrame(np.random.randn(100,4), columns=list('abcd'))
df.head(4)

          a         b         c         d
0  0.455258  1.135895  0.573383 -0.637943
1  0.262079 -0.397168 -0.980062 -1.600837
2  0.921582  0.767232 -0.298590 -0.159964
3 -0.645110 -0.709058  1.223899  0.382212

Then, I would like to create a fifth column e that looks like:

          a         b         c         d         e
0  0.455258  1.135895  0.573383 -0.637943 [[0.455258  1.135895  0.573383 -0.637943]]
1  0.262079 -0.397168 -0.980062 -1.600837 [[0.262079 -0.397168 -0.980062 -1.600837]]
2  0.921582  0.767232 -0.298590 -0.159964 [[0.921582  0.767232 -0.298590 -0.159964]]
3 -0.645110 -0.709058  1.223899  0.382212 [[-0.645110 -0.709058  1.223899  0.382212]]

efficiently.

My most efficient but wrong guess so far has been to do

df['e'] = df.values.tolist()

But that just results in:

          a         b         c         d         e
0  0.455258  1.135895  0.573383 -0.637943 [0.455258  1.135895  0.573383 -0.637943]
1  0.262079 -0.397168 -0.980062 -1.600837 [0.262079 -0.397168 -0.980062 -1.600837]
2  0.921582  0.767232 -0.298590 -0.159964 [0.921582  0.767232 -0.298590 -0.159964]
3 -0.645110 -0.709058  1.223899  0.382212 [-0.645110 -0.709058  1.223899  0.382212]

My least efficient but correct guess has been:

a = []
for index, row in df.iterrows():
    a.append([[row['a'],row['b'],row['c'],row['d']]])

Is there a better way?

PaulS · Accepted Answer · 2022-11-24 16:50:07Z

1

Another possible solution:

df['e'] = df.values.tolist()
df['e'] = df['e'].map(lambda x: [x])

Output:

          a         b         c         d  \
0 -1.594129  1.692562  0.602186 -1.620295   
1 -0.561567 -0.033658 -1.259215  1.054229   
2  0.450852 -0.483194  0.126173  0.354781   
3  2.060968 -0.428400 -0.973516 -0.201786   
4 -0.977307 -0.123215 -1.494138 -0.175432   

                                                   e  
0  [[-1.5941291794267378, 1.6925620764107292, 0.6...  
1  [[-0.5615669341251519, -0.03365818317800309, -...  
2  [[0.45085184068754164, -0.48319360005444034, 0...  
3  [[2.0609676606685086, -0.42839969840552594, -0...  
4  [[-0.9773067339895964, -0.12321466907036417, -...

answered Nov 24, 2022 at 16:50

PaulS

27.1k3 gold badges19 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

DS_Testing Over a year ago

Ah I was not far off, I should have guessed that a lambda would do it. I'll accept this answer as testing on my machine shows it to be ~3x faster than Scott's, though both produce the correct output.

Scott Boston · Accepted Answer · 2022-11-24 16:56:17Z

1

Let's use np.array_split:

df['e'] = np.array_split(df.to_numpy(), df.shape[0], axis=0)

Output:

           a         b         c         d                                                  e
0  -0.164745 -0.498313 -0.247778 -1.531003  [[-0.16474534230721335, -0.49831346259483156, ...
1   0.079485  0.125790  0.002755 -0.182361  [[0.0794845071834397, 0.12579014367640728, 0.0...
2   0.790263  0.488152 -0.752555  0.432949  [[0.790263001866772, 0.48815219760288764, -0.7...
3  -0.139499 -1.493593 -1.708668 -2.495497  [[-0.13949904491921675, -1.493593498340277, -1...
4   2.662431  0.247559 -0.949407  2.746299  [[2.662430989009563, 0.2475588133223812, -0.94...
..       ...       ...       ...       ...                                                ...
95  0.252663  1.018614 -0.491736 -0.290786  [[0.252663350866794, 1.018613617727022, -0.491...
96  1.023089 -0.367463  0.437327 -0.017441  [[1.0230888404185123, -0.3674628009130751, 0.4...
97  0.571278  0.450803  0.441102  1.176884  [[0.5712775025212533, 0.4508029251387083, 0.44...
98  1.336477  0.166516  0.408941  0.972896  [[1.3364769455886123, 0.16651649771088423, 0.4...
99 -1.298205  1.868477 -0.174665  0.065565  [[-1.2982050517578514, 1.8684774453090633, -0....

answered Nov 24, 2022 at 16:56

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

1 Comment

DS_Testing Over a year ago

So this is a nice one-liner that produces the desired result, however Paul's suggestion was ~3x faster when I timed them both, so I've accepted his reply. Thank you for the insight into array_split however.

Mouad Slimane · Accepted Answer · 2022-11-25 11:22:03Z

0

try:

df["e"]=df.apply(lambda x:[[x[column] for column in df.columns]],axis=1)

edited Nov 25, 2022 at 11:22

answered Nov 24, 2022 at 16:40

Mouad Slimane

1,0635 silver badges18 bronze badges

2 Comments

Azhar Khan Over a year ago

This does not produce the same results as requested by the OP. Please verify and fix it.

Thomas BDX Over a year ago

Kindly improve the answer with output or reasoning on the method and functions.

Collectives™ on Stack Overflow

Combine Pandas columns into a nested list

3 Answers 3

1 Comment

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related