2

I have dataframe that contains two columns of different classes diamond, gold and silver.

class_pd = pd.DataFrame({'old_class':['gold', 'gold' , 'silver'],
    'new_class':['diamond', 'silver', 'silver']})

I want to create a new column that shows wither the classes was Upgraded or Downgraded

What I have tried

I wrote the below function to set the rules

def status_desc(class_pd, old_class, new_class):
    if ((class_pd['old_class'] == 'gold') & (class_pd['new_class'] == 'diamond') or \
       (class_pd['old_class'] == 'silver') & (class_pd['new_class'] == 'diamond') or \
       (class_pd['old_class'] == 'silver') & (class_pd['new_class'] == 'gold')):
        val = 'Upgrade'
    elif ((class_pd['old_class'] == 'diamond') & (class_pd['new_class'] == 'gold') or \
       (class_pd['old_class'] == 'diamond') & (class_pd['new_class'] == 'silver') or \
       (class_pd['old_class'] == 'gold') & (class_pd['new_class'] == 'silver')):
        val = 'Downgrade'
    else:
         val = 'NA'

Then I tried to apply the function to my dataframe using the below method

class_pd['class_desc'] = class_pd.apply(lambda x: status_desc(class_pd['old_class'], class_pd['new_class']), axis=1)

Error

I get this error

TypeError: status_desc() missing 1 required positional argument: new_class

Desired Output

class_pd = pd.DataFrame({'old_class':['gold', 'gold' , 'silver'],
    'new_class':['diamond', 'silver', 'silver'],
                        'class_desc':['Upgrade','Downgrade', 'NA']})

4 Answers 4

3

Another solution with pd.Categorical, seems more elegant to me and more scalable:

categories = ['silver', 'gold', 'diamond']
class_pd = class_pd.apply(pd.Categorical, categories=categories, ordered=True)

class_pd['class_desc'] = 'NA'

class_pd.loc[class_pd.old_class > class_pd.new_class, 'class_desc'] = 'Downgrade'
class_pd.loc[class_pd.old_class < class_pd.new_class, 'class_desc'] = 'Upgrade'

We tell Pandas the inherent order, and can then use comparison operators.

Another way to do the last bit (after adding categories) suggested by @jezrael with numpy.select:

import numpy as np

conditions = [
    class_pd.old_class < class_pd.new_class,
    class_pd.old_class > class_pd.new_class,
    class_pd.old_class == class_pd.new_class,
]
labels = ["Upgrade", "Downgrade", "NA"]
class_pd["class_desc"] = np.select(conditions, labels)
Sign up to request clarification or add additional context in comments.

6 Comments

Working for same solution, super!
thanks, it means a lot to hear that from you! :)
Maybe np.select should be alternative ;)
I was working on the same answer, but this turned out a bit nicer :) (I was struggling on getting less than, greater or equal, but setting the default as "NA" is how to do it nicely)
@JoshFriedlander - No, be free add to answer.
|
1

Your function status_desc takes 3 arguments: class_pd, old_class, new_class, but you are only passing 2 arguments class_pd['old_class'], class_pd['new_class']. You need to pass the first argument for class_pd as well. Also you're missing a few things:

  • you need to return the values, not just assign them to val. So return "Upgrade", "Downgrade" and "NA".
  • In you .apply you need to pass the x of the lambda function, if you pass class_pd you pass the whole dataframe. x contains a single row of the df, so you're looping through each row and the function looks at the old_class and new_class columns for each row for the logic.

However a simpler step would be to only have 1 argument (the row) and define your function like this since you're not even using old_class, new_class in your function:

def status_desc(class_pd):
    if ((class_pd['old_class'] == 'gold') & (class_pd['new_class'] == 'diamond') or \
       (class_pd['old_class'] == 'silver') & (class_pd['new_class'] == 'diamond') or \
       (class_pd['old_class'] == 'silver') & (class_pd['new_class'] == 'gold')):
        return 'Upgrade'
    elif ((class_pd['old_class'] == 'diamond') & (class_pd['new_class'] == 'gold') or \
       (class_pd['old_class'] == 'diamond') & (class_pd['new_class'] == 'silver') or \
       (class_pd['old_class'] == 'gold') & (class_pd['new_class'] == 'silver')):
        return 'Downgrade'
    else:
         return 'NA'

Then call it using:

class_pd['class_desc'] = class_pd.apply(lambda x: status_desc(x), axis=1)

Output using this code:

old_class   new_class   class_desc
0   gold    diamond     Upgrade
1   gold    silver      Downgrade
2   silver  silver      NA

1 Comment

No worries, glad I could help! You can read more about this way of using apply and lambda to achieve this + other techniques to achieve the same result here: datascienceparichay.com/article/… @Leena
1

Here, the main logic is to provide rank list which will replicate the importance by position and then compare position number new and old using if else. Code:

rank = ['silver', 'gold', 'diamond'] #position silver = 0, gold=1 ,dia=2
class_pd['class_desc'] = class_pd.apply(lambda x: ('Upgrade' if (rank.index(x.old_class)) < (rank.index(x.new_class)) else 'Downgrade') if x.old_class != x.new_class else 'NA',axis=1)
class_pd

Output:

    old_class   new_class   class_desc
0   gold       diamond      Upgrade
1   gold       silver       Downgrade
2   silver     silver       NA

Comments

0

Firstly, you need to give one more parameter which is "class_pd" to your function. Also you need to give indexes of column names. For instance instead of class_pd['old_class'] == 'gold' you need to write class_pd['old_class'][0] == 'gold'.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.