2

So I have started a question yesterday: Multiple assignment in pandas based on the values in the same row, where I was wondering how to rank a row of data and assign the ranks to different columns in the same row. I have figured out how to do it by following Ed Chum's advice from here: how to apply a function to multiple columns in a pandas dataframe at one time .

And it actually worked, but then I noticed that I was creating incorrect columns along the way. And once I fix the bug, it no longer works....

So I have tried to recreate the issue on a toy example, and it does not work on the toy example too. Can someone point me to the error please, here is the code (python 3):

import pandas as pd
import numpy as np  
import scipy


df = pd.DataFrame(data={'a':[1,2,3],'b':[2,1,3],'c':[3,1,2],
                        'rank_a':[np.nan]*3,'rank_b':[np.nan]*3,'rank_c':[np.nan]*3})

def apply_rank(row):
    vals = [row['a'],row['b'],row['c']]
    ranked = scipy.stats.rankdata(vals)
    d = len(vals)+1
    ranked = [rank/d for rank in ranked]
    rank_cols = [col for col in row.index if col.startswith("rank_")]
    print("ranked: "+str(ranked))

    for idx,rank_col in enumerate(rank_cols): 
        print("Before: "+str(row[rank_col]))
        row[rank_col] = ranked[idx]
        print("After: "+str(row[rank_col]))

then run: df.apply(lambda row: apply_rank(row),axis=1), to see that the assignments are done correctly.

and then run: df to see that nothing was assigned.. facepalm

2

2 Answers 2

2

You can return Series with index for values of new columns:

def apply_rank(row):
    vals = [row['a'],row['b'],row['c']]
    ranked = scipy.stats.rankdata(vals)
    d = len(vals)+1
    ranked = [rank/d for rank in ranked]
    rank_cols = [col for col in row.index if col.startswith("rank_")]

    return pd.Series(ranked, index=rank_cols)

df = df.apply(lambda row: apply_rank(row),axis=1)
print (df)
   rank_a  rank_b  rank_c
0   0.250   0.500   0.750
1   0.750   0.375   0.375
2   0.625   0.625   0.250

EDIT: If new columns exist before is possible append data to them and return row:

def apply_rank(row):
    vals = [row['a'],row['b'],row['c']]
    ranked = scipy.stats.rankdata(vals)
    d = len(vals)+1
    ranked = [rank/d for rank in ranked]
    rank_cols = [col for col in row.index if col.startswith("rank_")]

    row.loc[rank_cols] = ranked
    return row

df = df.apply(apply_rank,axis=1)
print (df)
     a    b    c  rank_a  rank_b  rank_c
0  1.0  2.0  3.0   0.250   0.500   0.750
1  2.0  1.0  1.0   0.750   0.375   0.375
2  3.0  3.0  2.0   0.625   0.625   0.250
Sign up to request clarification or add additional context in comments.

4 Comments

is it possible to preserve the original columns in there as well?
PERFECT! LEGEND!
You are welcome! And I have already similar joy if something working ;)
I have spent two hours on this... :)
0

df[col].iloc[[2,3,4] = 2

in dataframe df, at particular column name col, for the index (2,3,4) We can set the value as 2 as shown above

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.