Python & Pandas: Structuring lambda argument

Question

I'm writing a script in Python and Pandas that uses lambda statements to write pre-formatted comments in a csv column based on the numerical grade assigned to each row. While I'm able to do this for a number of series, I'm having difficulty with one case.

Here the is structure of the csv:

Here is the working code to write a new column composition_comment. (I'm sure there's a way to express this more concisely, but I'm still learning Python and Pandas.)

import pandas as pd

df = pd.read_csv('stack.csv')
composition_score_value = 40  #calculated by another process

composition_comment_a_level = "Good work." # For scores falling between 100 and 90 percent of composition_score_value.
composition_comment_b_level = "Satisfactory work." # For scores between 89 and 80.
composition_comment_c_level = "Improvement needed." # For scores between 79 and 70.
composition_comment_d_level = "Unsatisfactory work." # For scores below 69.

df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_a_level if element <= (composition_score_value * 1) else element >= (composition_score_value *.90))
df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_b_level if element <= (composition_score_value *.899) else element >= (composition_score_value *.80))
df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_c_level if element <= (composition_score_value *.799) else element >= (composition_score_value *.70))
df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_d_level if element <= (composition_score_value *.699) else element >= (composition_score_value *.001))

df 

df.to_csv('stack.csv', index=False)

The expected output is:

But the actual output is:

Any ideas on why the True values being written, and why the final row processes properly? Any assistance appreciated.

Henry Ecker · Accepted Answer · 2021-07-28 15:10:56Z

While many of the other options show how to improve the apply operation, I would suggest using pd.cut:

df['composition_comment'] = pd.cut(
    df['composition_score'] / composition_score_value,  # Divide to get percent
    bins=[0, 0.7, 0.8, 0.9, np.inf],                    # Set Bounds
    labels=[composition_comment_d_level,                # Set Labels
            composition_comment_c_level,
            composition_comment_b_level,
            composition_comment_a_level],
    right=False                                         # Set Lower bound inclusive
)

df:

   composition_score   composition_comment
0                 40            Good work.
1                 35    Satisfactory work.
2                 31   Improvement needed.
3                 27  Unsatisfactory work.

*Setting right=False makes the lowerbound inclusive meaning the bins are:

[0.0, 0.7)  # 0.0 (inclusive) up to 0.7 (not inclusive)
[0.7, 0.8)  # 0.7 (inclusive) up to 0.8 (not inclusive)
[0.8, 0.9)  # 0.8 (inclusive) up to 0.9 (not inclusive)
[0.9, inf)  # 0.9 (inclusive) up to infinity

Notes:

inf could be modified if there was a set upper bound. 1 will not work as the upper bound with right=False since 1 is not strictly less than 1.
np.NINF could be used instead of the lower bound if values less than 0 are expected

The primary benefit is that there is a categorical which is returned. Meaning that operations like sort_values will sort, not alphabetically, but rather by category.

['Unsatisfactory work.' < 'Improvement needed.' < 'Satisfactory work.' < 'Good work.']

df = df.sort_values('composition_comment')

df:

   composition_score   composition_comment
3                 27  Unsatisfactory work.
2                 31   Improvement needed.
1                 35    Satisfactory work.
0                 40            Good work.

Program Setup:

import numpy as np
import pandas as pd

df = pd.DataFrame({'composition_score': [40, 35, 31, 27]})
composition_score_value = 40  # calculated by another process

# For scores falling between 100 and 90 percent of composition_score_value.
composition_comment_a_level = "Good work."
# For scores between 89 and 80.
composition_comment_b_level = "Satisfactory work."
# For scores between 79 and 70.
composition_comment_c_level = "Improvement needed."
# For Scores below 70
composition_comment_d_level = "Unsatisfactory work."

RJ Adriaansen · Accepted Answer · 2021-07-28 14:51:34Z

1

else returns nothing in the lambda functions, so it only returns True. I suggest combining them in a single function, while also inverting the order:

composition_score_value = 40  #calculated by another process

def return_level(element):
    if element <= (composition_score_value *.699):
        return "Unsatisfactory work." # For scores below 69.
    elif element <= (composition_score_value *.799):
        return "Improvement needed." # For scores between 79 and 70.
    elif element <= (composition_score_value *.899):
        return "Satisfactory work." # For scores between 89 and 80.
    elif element <= (composition_score_value * 1):
        return "Good work." # For scores falling between 100 and 90 percent of composition_score_value.
    else:
        return None

df['composition_comment'] = df['composition_score'].apply(return_level)

Result:

	composition_score	composition_comment
0	40	Good work.
1	35	Satisfactory work.
2	31	Improvement needed.
3	27	Unsatisfactory work.

answered Jul 28, 2021 at 14:51

RJ Adriaansen

9,7092 gold badges16 silver badges29 bronze badges

2 Comments

Daniel Hutchinson Over a year ago

Success! Assistance is greatly appreciated!

rahlf23 Over a year ago

Great answer to the OP question. @Daniel Hutchinson, you may give my answer a look if you're interested in a more scalable solution to the original problem.

Acccumulation · Accepted Answer · 2021-07-28 15:08:23Z

First, a style comment: you should be using more carriage returns, and your variable names are rather long (including both "score" and "value" in a name is a bit redundant). I believe the following will still work, and not require side scrolling:

df['composition_comment'] = df['composition_score'].apply(
    lambda element: comp_comment_a if element <= (comp_score * 1) 
        else element >= (comp_score *.90))

As for what's going on, the above code tells Python that you want comp_comment_a if element is less than or equal to comp_score * 1, otherwise you want the value of element >= (comp_score *.90). And element >= (comp_score *.90) is a boolean. I'm not entirely clear on what your intended result is, but based on my guess as to what you want, you should have and rather than else. Your code can be made much cleaner, e.g.:

import pandas as pd

df = pd.read_csv('stack.csv')
comp_score = 40  #calculated by another process

comp_comments = ["Good work.", "Satisfactory work.", "Improvement needed.", 
    "Unsatisfactory work."]

def score_to_comment(score):
    if score > comp_score:
        #it's not clear what you want to do in this case
        #but this follows your original code
        return None
    if score >= comp_score * .9:
        return comp_comments[0]
    if score >= comp_score * .8:
       return comp_comments[1]
    if score >= comp_score * .7:
       return comp_comments[2]
    return comp_comments[3]

df['composition_comment'] = df['composition_score'].apply(score_to_comment)

df.to_csv('stack.csv', index=False)

Sky Scraper · Accepted Answer · 2021-07-28 15:14:13Z

1

An elegant way using Numpy, instead of if/elif usage:

import numpy as np

conditions = [ (df['composition_score']>90,
               (df['composition_score']>80) & (df['composition_score']<90),
               (df['composition_score']>70) & (df['composition_score']<80),
               (df['composition_score']<70
                      ]

choices = ['Great','Not Bad','Poor','very ugly']


df['composition_comment'] = np.select(conditions , choices , default='')

Note: default='' rapresents the default value when no conditions are satisfied.

edited Jul 28, 2021 at 15:14

answered Jul 28, 2021 at 15:01

Sky Scraper

394 bronze badges

Comments

rahlf23 · Accepted Answer · 2021-07-28 15:19:21Z

1

You can use pd.cut() to perform the mapping, and this scales much better than having to explicitly write out each condition in an if-else statement:

import pandas as pd

df = pd.DataFrame({'composition_score': [40, 35, 31, 27]})
composition_score_value = 40

bins = pd.IntervalIndex.from_tuples([(0, 70), (70,80), (80,90), (90,100)])
labels = ['Unsatisfactory work.','Improvement needed.','Satisfactory work.','Good work.']
d = dict(zip(bins,labels))
x = pd.cut(df['composition_score']/composition_score_value*100, bins, right=False).map(d)

Yields:

0              Good work.
1      Satisfactory work.
2     Improvement needed.
3    Unsatisfactory work.
Name: composition_score, dtype: category
Categories (4, object): ['Unsatisfactory work.' < 'Improvement needed.' < 'Satisfactory work.' < 'Good work.']

edited Jul 28, 2021 at 15:19

answered Jul 28, 2021 at 14:55

rahlf23

9,0494 gold badges30 silver badges57 bronze badges

1 Comment

Acccumulation Over a year ago

I think the bins should be [(0, 70), (70,80), (80,90), (90,100)], and then set right = False.

Muhteva · Accepted Answer · 2021-07-28 15:19:52Z

1

It overwrites what you have done in the above lines of codes. When it sees a value that satisfies element <= (composition_score_value *.699) it returns composition_comment_d_level. If value doesn't satisfy that, then it returns element >= (composition_score_value *.001), which is essentially a boolean value, True or False.

This one should work:

def composition_comment(element):
    composition_score_value = 40
    if element <= (composition_score_value *.699) :
        return "Unsatisfactory work."  
    elif element <= (composition_score_value *.799):
        return "Improvement needed."
    elif  element <= (composition_score_value *.899):
        return "Satisfactory work."
    elif element <= (composition_score_value * 1):
        return "Good work."
    else:
        return None

df['composition_comment'] = df['composition_score'].apply(composition_comment)
df

edited Jul 28, 2021 at 15:19

answered Jul 28, 2021 at 14:52

Muhteva

2,8402 gold badges12 silver badges23 bronze badges

1 Comment

Acccumulation Over a year ago

You need to reverse either the order of the comparison, or their directions. As it stands, any score less than composition_score_value will result in "Good work.".

Collectives™ on Stack Overflow

Python & Pandas: Structuring lambda argument

6 Answers 6

Comments

2 Comments

Comments

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

2 Comments

Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related