Python appending a list to dataframe column

Question

I have a dataframe from a stata file and I would like to add a new column to it which has a numeric list as an entry for each row. How can one accomplish this? I have been trying assignment but its complaining about index size.

I tried initiating a new column of strings (also tried integers) and tried something like this but it didnt work.

testdf['new_col'] = '0'
testdf['new_col'] = testdf['new_col'].map(lambda x : list(range(100)))

Here is a toy example resembling what I have:

data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd'], 'start_val': [1,7,9,10], 'end_val' : [3,11, 12,15]}
testdf = pd.DataFrame.from_dict(data)

This is what I would like to have:

data2 = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd'], 'start_val': [1,7,9,10], 'end_val' : [3,11, 12,15], 'list' : [[1,2,3],[7,8,9,10,11],[9,10,11,12],[10,11,12,13,14,15]]}
testdf2 = pd.DataFrame.from_dict(data2)

My final goal is to use explode on that "list" column to duplicate the rows appropriately.

Scott Boston · Accepted Answer · 2020-11-08 01:57:27Z

2

Try this bit of code:

testdf['list'] = pd.Series(np.arange(i, j) for i, j in zip(testdf['start_val'], 
                                                            testdf['end_val']+1))
testdf

Output:

   col_1 col_2  start_val  end_val                      list
0      3     a          1        3                 [1, 2, 3]
1      2     b          7       11         [7, 8, 9, 10, 11]
2      1     c          9       12           [9, 10, 11, 12]
3      0     d         10       15  [10, 11, 12, 13, 14, 15]

Let's use comprehension and zip with a pd.Series constructor and np.arange to create the lists.

edited Nov 8, 2020 at 1:57

answered Nov 8, 2020 at 1:51

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Amatya Over a year ago

Amazing!! Thank you!

Scott Boston Over a year ago

@Amatya You're welcome. Happy coding. Be safe and stay healthy.

Serial Lazer · Accepted Answer · 2020-11-08 01:56:58Z

1

If you'd stick to using the apply function:

import pandas as pd
import numpy as np

data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd'], 'start_val': [1,7,9,10], 'end_val' : [3,11, 12,15]}

df = pd.DataFrame.from_dict(data)
df['range'] = df.apply(lambda row: np.arange(row['start_val'], row['end_val']+1), axis=1)

print(df)

Output:

   col_1 col_2  start_val  end_val                     range
0      3     a          1        3                 [1, 2, 3]
1      2     b          7       11         [7, 8, 9, 10, 11]
2      1     c          9       12           [9, 10, 11, 12]
3      0     d         10       15  [10, 11, 12, 13, 14, 15]

answered Nov 8, 2020 at 1:56

Serial Lazer

1,6691 gold badge9 silver badges16 bronze badges

1 Comment

Amatya Over a year ago

Thank you for showing me how to do it using Apply! Really appreciate being taught many ways of doing stuff.

Collectives™ on Stack Overflow

Python appending a list to dataframe column

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related