create new columns or dataframe with a function?

Question

My question is how to simplify the following code. In my real data I would have to add 540 columns, but I guess there is a better way, especially for generating the columns. Maybe even separate dataframes?

Below you'll see the test-df I have with one column needed. 'bin_X_0' through 'bin_X_9' need to be generated, but then several, i.e. bin_Y_0, bin_Z_0 etc. through 9.

N = 10000
J = [2012,2013,2014]
K = ['A','B','C','D','E','F','G','H']
L = ['h', 'd', 'a'] 
S = ['AR1','PO1','RU1']

np.random.seed(0)

df = pd.DataFrame(
    {'Y':np.random.uniform(1,10,N),
     'X':np.random.uniform(1,10,N),
     'Z':np.random.uniform(1,10,N),
     'J':np.random.choice(J,N),
     'S':np.random.choice(S,N),
     'R':np.random.choice(L,N)
    })

df['bins_X'] = df.groupby('S').X.apply(pd.qcut, q=10, labels=np.arange(10))

df['bin_X_0'] = np.where((df['bins_X'] ==0) & (df['R'] =='a'), (df['X']*2)-2, 
                        np.where((df['bins_X'] ==0) & (df['R'] !='a'), -2, 0))

df.head()

Marjan Moderc · Accepted Answer · 2017-01-03 16:51:35Z

1

You can achieve this with a double for loop, by using current iterables and format function for the new column generation. The following code should do it:

for a in ["X", "Y", "Z"]:
    for num in list(range(0, 10)):
        df['bins_{}'.format(a)] = df.groupby('S')["{}".format(a)].apply(pd.qcut, q=10, labels=np.arange(10))

        df['bin_{}_{}'.format(a, num)] = np.where((df['bins_{}'.format(a)] == num) & (df['R'] == 'a'),
                                                  (df['{}'.format(a)] * 2) - 2,
                                                  np.where((df['bins_{}'.format(a)] == num) & (df['R'] != 'a'), -2, 0))

I hope it is clear, how current numbers and letters are dynamically picking up the appropriate data and generating two new columns inside every nested loop. The above code produces this dataframe, which I believe is what you are looking for:

 J  R    S         X         Y         Z bins_X   bin_X_0  bin_X_1  \
0  2014  d  PO1  7.734412  5.939322  4.529557      7  0.000000      0.0   
1  2014  h  AR1  2.621824  7.436704  1.370409      1  0.000000     -2.0   
2  2013  h  PO1  4.501208  6.424870  9.309705      4  0.000000      0.0   
3  2012  a  RU1  1.338402  5.903949  4.656115      0  0.676803      0.0   
4  2012  d  RU1  1.106090  4.812893  9.498540      0 -2.000000      0.0   

   bin_X_2   ...     bin_Z_0  bin_Z_1  bin_Z_2  bin_Z_3  bin_Z_4  bin_Z_5  \
0      0.0   ...         0.0      0.0      0.0     -2.0  0.00000      0.0   
1      0.0   ...        -2.0      0.0      0.0      0.0  0.00000      0.0   
2      0.0   ...         0.0      0.0      0.0      0.0  0.00000      0.0   
3      0.0   ...         0.0      0.0      0.0      0.0  7.31223      0.0   
4      0.0   ...         0.0      0.0      0.0      0.0  0.00000      0.0   

   bin_Z_6 bin_Z_7  bin_Z_8  bin_Z_9  
0      0.0     0.0      0.0      0.0  
1      0.0     0.0      0.0      0.0  
2      0.0     0.0      0.0     -2.0  
3      0.0     0.0      0.0      0.0  
4      0.0     0.0      0.0     -2.0

I hope you get the idea.

edited Jan 3, 2017 at 16:51

answered Jan 3, 2017 at 15:50

Marjan Moderc

2,87928 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Zanshin Over a year ago

Thanks Marjan, looks good. I'll check. One thing though I forgot to mention. After.... format(a)] == it should be the corresponding bin. In the example it is 0, should be i.e bin_X_5.... 5

Marjan Moderc Over a year ago

I replaced two zeros with num as you suggested. Otherwise, I haven't dug into the computations alone, so there might me still some number unchanged into variable. Feel free to edit my answer once you see the pattern.

epattaro · Accepted Answer · 2017-01-02 17:27:11Z

0

the best way to add new columns if they are related to the data already in the DataFrame is to use pd.concat or pd.merge.

You can read more about it on http://pandas.pydata.org/pandas-docs/stable/merging.html

answered Jan 2, 2017 at 17:27

epattaro

2,4981 gold badge21 silver badges32 bronze badges

Collectives™ on Stack Overflow

create new columns or dataframe with a function?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related