How to add one column to pandas dataframe based on values in different columns?

Question

I have a pandas data frame like below:

a     b    c    d   
0.7   0.1  0.2  0.3
0.5   0.2  0.2  0.2

I am writing some nested loops like below to add a column result based on these 4 columns.

def class_decider(df):
    for i in df['a']:
        if i > 0.6:
            a = "class A"
        elif:
            for j in df['b']:
                if j > 0.2:
                    a = "class B"
                elif:
                    for k in df['c']:
                        if j > 0.15:
                            a = "class C"
                        elif:
                            for l in df['d']:
                                if l > 0.10:
                                    a = "class D"
                                else:
                                    a = "null"
    return a

Could anyone please help in optimising the code.

Expected Output:

a     b    c    d     result
0.7   0.1  0.2  0.3   class A
0.5   0.2  0.2  0.2   class C

Please also add your expected output based on your sample dataframe. — ashkangh
– ashkangh, Commented Mar 22, 2021 at 17:29

Shubham Sharma · Accepted Answer · 2021-03-22 17:59:17Z

7

IIUC, You can compare the columns a, b, c and d with 0.6, 0.2, 0.15, 0.10 to create a boolean mask, then use idxmax along axis=1 on this mask to get the name of the column where the first True value occur in the mask.

c = ['a', 'b', 'c', 'd']
m = df[c].gt([0.6, 0.2, 0.15, 0.10])
df['Result'] = m.idxmax(1).radd('Class ').mask(~m.any(1), 'Null')

     a    b    c    d   Result
0  0.7  0.1  0.2  0.3  Class a
1  0.5  0.2  0.2  0.2  Class c

edited Mar 22, 2021 at 17:59

answered Mar 22, 2021 at 17:47

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Thivagar Moorthy Over a year ago

Thanks for the answer, but these are not the only columns. There are several other columns but the "result" column is based on these four. Also, let's assume instead of "class A", "class B" etc,. it is different classes like "red", "blue" etc,.

Thivagar Moorthy Over a year ago

Thanks for the answer, but these are not the only columns. There are several other columns but the "result" column is based on these four. Also, let's assume instead of "class A", "class B" etc,. it is different classes like "red", "blue" etc,.

Shubham Sharma Over a year ago

@ThivagarMoorthy In that case we can select only the subset of columns. I've edited the answer.

user2357448 · Accepted Answer · 2021-03-22 18:17:48Z

If you're looking for interpretable, flexible, but not necessarily the best performing solutions, here are two approaches:

Approach 1: Using .loc and column comparisons

df = pd.DataFrame({'a':[0.7, 0.5], 'b':[0.1, 0.2], 'c':[0.2, 0.2], 'd':[0.3, 0.2]})
df['result'] = None
df.loc[df['d'] > 0.1, 'result'] = 'class_d'
df.loc[df['c'] > 0.15, 'result'] = 'class_c'
df.loc[df['b'] > 0.2, 'result'] = 'class_b'
df.loc[df['a'] > 0.6, 'result'] = 'class_a'

Approach 2 Using df.iterrows()

df = pd.DataFrame({'a':[0.7, 0.5], 'b':[0.1, 0.2], 'c':[0.2, 0.2], 'd':[0.3, 0.2]})
df['result'] = None

for idx, row in df.iterrows():
    if row['a'] > 0.6:
        df.loc[idx, 'result'] = 'class_a'
    elif row['b'] > 0.2:
        df.loc[idx, 'result'] = 'class_b'
    elif row['c'] > 0.15:
        df.loc[idx, 'result'] = 'class_c'
    elif row['d'] > 0.1:
        df.loc[idx, 'result'] = 'class_d'

Collectives™ on Stack Overflow

How to add one column to pandas dataframe based on values in different columns?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related