2

I have a pandas data frame like below:

a     b    c    d   
0.7   0.1  0.2  0.3
0.5   0.2  0.2  0.2

I am writing some nested loops like below to add a column result based on these 4 columns.

def class_decider(df):
    for i in df['a']:
        if i > 0.6:
            a = "class A"
        elif:
            for j in df['b']:
                if j > 0.2:
                    a = "class B"
                elif:
                    for k in df['c']:
                        if j > 0.15:
                            a = "class C"
                        elif:
                            for l in df['d']:
                                if l > 0.10:
                                    a = "class D"
                                else:
                                    a = "null"
    return a

Could anyone please help in optimising the code.

Expected Output:

a     b    c    d     result
0.7   0.1  0.2  0.3   class A
0.5   0.2  0.2  0.2   class C
1
  • 1
    Please also add your expected output based on your sample dataframe. Commented Mar 22, 2021 at 17:29

2 Answers 2

7

IIUC, You can compare the columns a, b, c and d with 0.6, 0.2, 0.15, 0.10 to create a boolean mask, then use idxmax along axis=1 on this mask to get the name of the column where the first True value occur in the mask.

c = ['a', 'b', 'c', 'd']
m = df[c].gt([0.6, 0.2, 0.15, 0.10])
df['Result'] = m.idxmax(1).radd('Class ').mask(~m.any(1), 'Null')

     a    b    c    d   Result
0  0.7  0.1  0.2  0.3  Class a
1  0.5  0.2  0.2  0.2  Class c
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the answer, but these are not the only columns. There are several other columns but the "result" column is based on these four. Also, let's assume instead of "class A", "class B" etc,. it is different classes like "red", "blue" etc,.
Thanks for the answer, but these are not the only columns. There are several other columns but the "result" column is based on these four. Also, let's assume instead of "class A", "class B" etc,. it is different classes like "red", "blue" etc,.
@ThivagarMoorthy In that case we can select only the subset of columns. I've edited the answer.
0

If you're looking for interpretable, flexible, but not necessarily the best performing solutions, here are two approaches:

Approach 1: Using .loc and column comparisons

df = pd.DataFrame({'a':[0.7, 0.5], 'b':[0.1, 0.2], 'c':[0.2, 0.2], 'd':[0.3, 0.2]})
df['result'] = None
df.loc[df['d'] > 0.1, 'result'] = 'class_d'
df.loc[df['c'] > 0.15, 'result'] = 'class_c'
df.loc[df['b'] > 0.2, 'result'] = 'class_b'
df.loc[df['a'] > 0.6, 'result'] = 'class_a'

Approach 2 Using df.iterrows()

df = pd.DataFrame({'a':[0.7, 0.5], 'b':[0.1, 0.2], 'c':[0.2, 0.2], 'd':[0.3, 0.2]})
df['result'] = None

for idx, row in df.iterrows():
    if row['a'] > 0.6:
        df.loc[idx, 'result'] = 'class_a'
    elif row['b'] > 0.2:
        df.loc[idx, 'result'] = 'class_b'
    elif row['c'] > 0.15:
        df.loc[idx, 'result'] = 'class_c'
    elif row['d'] > 0.1:
        df.loc[idx, 'result'] = 'class_d'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.