My question is how to simplify the following code. In my real data I would have to add 540 columns, but I guess there is a better way, especially for generating the columns. Maybe even separate dataframes?
Below you'll see the test-df I have with one column needed. 'bin_X_0' through 'bin_X_9' need to be generated, but then several, i.e. bin_Y_0, bin_Z_0 etc. through 9.
N = 10000
J = [2012,2013,2014]
K = ['A','B','C','D','E','F','G','H']
L = ['h', 'd', 'a']
S = ['AR1','PO1','RU1']
np.random.seed(0)
df = pd.DataFrame(
{'Y':np.random.uniform(1,10,N),
'X':np.random.uniform(1,10,N),
'Z':np.random.uniform(1,10,N),
'J':np.random.choice(J,N),
'S':np.random.choice(S,N),
'R':np.random.choice(L,N)
})
df['bins_X'] = df.groupby('S').X.apply(pd.qcut, q=10, labels=np.arange(10))
df['bin_X_0'] = np.where((df['bins_X'] ==0) & (df['R'] =='a'), (df['X']*2)-2,
np.where((df['bins_X'] ==0) & (df['R'] !='a'), -2, 0))
df.head()
