0

I'm writing a script in Python and Pandas that uses lambda statements to write pre-formatted comments in a csv column based on the numerical grade assigned to each row. While I'm able to do this for a number of series, I'm having difficulty with one case.

Here the is structure of the csv:

enter image description here

Here is the working code to write a new column composition_comment. (I'm sure there's a way to express this more concisely, but I'm still learning Python and Pandas.)

import pandas as pd

df = pd.read_csv('stack.csv')
composition_score_value = 40  #calculated by another process

composition_comment_a_level = "Good work." # For scores falling between 100 and 90 percent of composition_score_value.
composition_comment_b_level = "Satisfactory work." # For scores between 89 and 80.
composition_comment_c_level = "Improvement needed." # For scores between 79 and 70.
composition_comment_d_level = "Unsatisfactory work." # For scores below 69.

df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_a_level if element <= (composition_score_value * 1) else element >= (composition_score_value *.90))
df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_b_level if element <= (composition_score_value *.899) else element >= (composition_score_value *.80))
df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_c_level if element <= (composition_score_value *.799) else element >= (composition_score_value *.70))
df['composition_comment'] = df['composition_score'].apply(lambda element: composition_comment_d_level if element <= (composition_score_value *.699) else element >= (composition_score_value *.001))

df 

df.to_csv('stack.csv', index=False)

The expected output is:

enter image description here

But the actual output is:

enter image description here

Any ideas on why the True values being written, and why the final row processes properly? Any assistance appreciated.

0

6 Answers 6

1

While many of the other options show how to improve the apply operation, I would suggest using pd.cut:

df['composition_comment'] = pd.cut(
    df['composition_score'] / composition_score_value,  # Divide to get percent
    bins=[0, 0.7, 0.8, 0.9, np.inf],                    # Set Bounds
    labels=[composition_comment_d_level,                # Set Labels
            composition_comment_c_level,
            composition_comment_b_level,
            composition_comment_a_level],
    right=False                                         # Set Lower bound inclusive
)

df:

   composition_score   composition_comment
0                 40            Good work.
1                 35    Satisfactory work.
2                 31   Improvement needed.
3                 27  Unsatisfactory work.

*Setting right=False makes the lowerbound inclusive meaning the bins are:

[0.0, 0.7)  # 0.0 (inclusive) up to 0.7 (not inclusive)
[0.7, 0.8)  # 0.7 (inclusive) up to 0.8 (not inclusive)
[0.8, 0.9)  # 0.8 (inclusive) up to 0.9 (not inclusive)
[0.9, inf)  # 0.9 (inclusive) up to infinity

Notes:

  1. inf could be modified if there was a set upper bound. 1 will not work as the upper bound with right=False since 1 is not strictly less than 1.
  2. np.NINF could be used instead of the lower bound if values less than 0 are expected

The primary benefit is that there is a categorical which is returned. Meaning that operations like sort_values will sort, not alphabetically, but rather by category.

['Unsatisfactory work.' < 'Improvement needed.' < 'Satisfactory work.' < 'Good work.']
df = df.sort_values('composition_comment')

df:

   composition_score   composition_comment
3                 27  Unsatisfactory work.
2                 31   Improvement needed.
1                 35    Satisfactory work.
0                 40            Good work.

Program Setup:

import numpy as np
import pandas as pd

df = pd.DataFrame({'composition_score': [40, 35, 31, 27]})
composition_score_value = 40  # calculated by another process

# For scores falling between 100 and 90 percent of composition_score_value.
composition_comment_a_level = "Good work."
# For scores between 89 and 80.
composition_comment_b_level = "Satisfactory work."
# For scores between 79 and 70.
composition_comment_c_level = "Improvement needed."
# For Scores below 70
composition_comment_d_level = "Unsatisfactory work."
Sign up to request clarification or add additional context in comments.

Comments

1

else returns nothing in the lambda functions, so it only returns True. I suggest combining them in a single function, while also inverting the order:

composition_score_value = 40  #calculated by another process

def return_level(element):
    if element <= (composition_score_value *.699):
        return "Unsatisfactory work." # For scores below 69.
    elif element <= (composition_score_value *.799):
        return "Improvement needed." # For scores between 79 and 70.
    elif element <= (composition_score_value *.899):
        return "Satisfactory work." # For scores between 89 and 80.
    elif element <= (composition_score_value * 1):
        return "Good work." # For scores falling between 100 and 90 percent of composition_score_value.
    else:
        return None

df['composition_comment'] = df['composition_score'].apply(return_level)

Result:

composition_score composition_comment
0 40 Good work.
1 35 Satisfactory work.
2 31 Improvement needed.
3 27 Unsatisfactory work.

2 Comments

Success! Assistance is greatly appreciated!
Great answer to the OP question. @Daniel Hutchinson, you may give my answer a look if you're interested in a more scalable solution to the original problem.
1

First, a style comment: you should be using more carriage returns, and your variable names are rather long (including both "score" and "value" in a name is a bit redundant). I believe the following will still work, and not require side scrolling:

df['composition_comment'] = df['composition_score'].apply(
    lambda element: comp_comment_a if element <= (comp_score * 1) 
        else element >= (comp_score *.90))

As for what's going on, the above code tells Python that you want comp_comment_a if element is less than or equal to comp_score * 1, otherwise you want the value of element >= (comp_score *.90). And element >= (comp_score *.90) is a boolean. I'm not entirely clear on what your intended result is, but based on my guess as to what you want, you should have and rather than else. Your code can be made much cleaner, e.g.:

import pandas as pd

df = pd.read_csv('stack.csv')
comp_score = 40  #calculated by another process

comp_comments = ["Good work.", "Satisfactory work.", "Improvement needed.", 
    "Unsatisfactory work."]

def score_to_comment(score):
    if score > comp_score:
        #it's not clear what you want to do in this case
        #but this follows your original code
        return None
    if score >= comp_score * .9:
        return comp_comments[0]
    if score >= comp_score * .8:
       return comp_comments[1]
    if score >= comp_score * .7:
       return comp_comments[2]
    return comp_comments[3]

df['composition_comment'] = df['composition_score'].apply(score_to_comment)

df.to_csv('stack.csv', index=False)

Comments

1

An elegant way using Numpy, instead of if/elif usage:

import numpy as np

conditions = [ (df['composition_score']>90,
               (df['composition_score']>80) & (df['composition_score']<90),
               (df['composition_score']>70) & (df['composition_score']<80),
               (df['composition_score']<70
                      ]

choices = ['Great','Not Bad','Poor','very ugly']


df['composition_comment'] = np.select(conditions , choices , default='')

Note: default='' rapresents the default value when no conditions are satisfied.

Comments

1

You can use pd.cut() to perform the mapping, and this scales much better than having to explicitly write out each condition in an if-else statement:

import pandas as pd

df = pd.DataFrame({'composition_score': [40, 35, 31, 27]})
composition_score_value = 40

bins = pd.IntervalIndex.from_tuples([(0, 70), (70,80), (80,90), (90,100)])
labels = ['Unsatisfactory work.','Improvement needed.','Satisfactory work.','Good work.']
d = dict(zip(bins,labels))
x = pd.cut(df['composition_score']/composition_score_value*100, bins, right=False).map(d)

Yields:

0              Good work.
1      Satisfactory work.
2     Improvement needed.
3    Unsatisfactory work.
Name: composition_score, dtype: category
Categories (4, object): ['Unsatisfactory work.' < 'Improvement needed.' < 'Satisfactory work.' < 'Good work.']

1 Comment

I think the bins should be [(0, 70), (70,80), (80,90), (90,100)], and then set right = False.
1

It overwrites what you have done in the above lines of codes. When it sees a value that satisfies element <= (composition_score_value *.699) it returns composition_comment_d_level. If value doesn't satisfy that, then it returns element >= (composition_score_value *.001), which is essentially a boolean value, True or False.

This one should work:

def composition_comment(element):
    composition_score_value = 40
    if element <= (composition_score_value *.699) :
        return "Unsatisfactory work."  
    elif element <= (composition_score_value *.799):
        return "Improvement needed."
    elif  element <= (composition_score_value *.899):
        return "Satisfactory work."
    elif element <= (composition_score_value * 1):
        return "Good work."
    else:
        return None

df['composition_comment'] = df['composition_score'].apply(composition_comment)
df

1 Comment

You need to reverse either the order of the comparison, or their directions. As it stands, any score less than composition_score_value will result in "Good work.".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.