0

I am new to python and therefore in pandas data frames as well. Lets say that I have a following data set:

d = {'a': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'b': [4, 4, 4, 5, 5, 5, 6, 6, 6]}
   ...: df = pd.DataFrame(data=d)
   ...: df
   ...: 
Out[20]: 
   a  b
0  1  4
1  1  4
2  1  4
3  2  5
4  2  5
5  2  5
6  3  6
7  3  6
8  3  6

What I want to do is to create new columns lets say b_1, b_2, b_3, based on the information I have in column a and b. The final data should look like this:

Out[21]: 
   a  b  b_1  b_2  b_3
0  1  4    4    0    0
1  1  4    4    0    0
2  1  4    4    0    0
3  2  5    0    5    0
4  2  5    0    5    0
5  2  5    0    5    0
6  3  6    0    0    6
7  3  6    0    0    6
8  3  6    0    0    6

In Stata this is achieved through the following command:

forvalues i=1(1)3{
gen b_`i'=b if a==`i'
replace b_`i'=0 if b_`i'==.
}

Any similar way of doing it in python? Thanks in advance

1
  • df.join(pd.DataFrame({f'b_{i}':x['b'] for i, x in df.groupby('a')}).fillna(0)) ..? Commented Mar 3, 2021 at 9:31

1 Answer 1

1

Use DataFrame.join with Series.unstack and DataFrame.add_prefix:

df = df.join(df.set_index('a', append=True)['b'].unstack(fill_value=0).add_prefix('b_'))
print (df)
   a  b  b_1  b_2  b_3
0  1  4    4    0    0
1  1  4    4    0    0
2  1  4    4    0    0
3  2  5    0    5    0
4  2  5    0    5    0
5  2  5    0    5    0
6  3  6    0    0    6
7  3  6    0    0    6
8  3  6    0    0    6
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.