2

I have a dataframe from a stata file and I would like to add a new column to it which has a numeric list as an entry for each row. How can one accomplish this? I have been trying assignment but its complaining about index size.

I tried initiating a new column of strings (also tried integers) and tried something like this but it didnt work.

testdf['new_col'] = '0'
testdf['new_col'] = testdf['new_col'].map(lambda x : list(range(100)))

enter image description here

Here is a toy example resembling what I have:

data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd'], 'start_val': [1,7,9,10], 'end_val' : [3,11, 12,15]}
testdf = pd.DataFrame.from_dict(data)

enter image description here

This is what I would like to have:

data2 = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd'], 'start_val': [1,7,9,10], 'end_val' : [3,11, 12,15], 'list' : [[1,2,3],[7,8,9,10,11],[9,10,11,12],[10,11,12,13,14,15]]}
testdf2 = pd.DataFrame.from_dict(data2)

enter image description here

My final goal is to use explode on that "list" column to duplicate the rows appropriately.

2 Answers 2

2

Try this bit of code:

testdf['list'] = pd.Series(np.arange(i, j) for i, j in zip(testdf['start_val'], 
                                                            testdf['end_val']+1))
testdf

Output:

   col_1 col_2  start_val  end_val                      list
0      3     a          1        3                 [1, 2, 3]
1      2     b          7       11         [7, 8, 9, 10, 11]
2      1     c          9       12           [9, 10, 11, 12]
3      0     d         10       15  [10, 11, 12, 13, 14, 15]

Let's use comprehension and zip with a pd.Series constructor and np.arange to create the lists.

Sign up to request clarification or add additional context in comments.

2 Comments

Amazing!! Thank you!
@Amatya You're welcome. Happy coding. Be safe and stay healthy.
1

If you'd stick to using the apply function:

import pandas as pd
import numpy as np

data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd'], 'start_val': [1,7,9,10], 'end_val' : [3,11, 12,15]}

df = pd.DataFrame.from_dict(data)
df['range'] = df.apply(lambda row: np.arange(row['start_val'], row['end_val']+1), axis=1)

print(df)

Output:

   col_1 col_2  start_val  end_val                     range
0      3     a          1        3                 [1, 2, 3]
1      2     b          7       11         [7, 8, 9, 10, 11]
2      1     c          9       12           [9, 10, 11, 12]
3      0     d         10       15  [10, 11, 12, 13, 14, 15]

1 Comment

Thank you for showing me how to do it using Apply! Really appreciate being taught many ways of doing stuff.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.