Conditional selection of values from other column

Question

Some of the rows in my dataset are messy

Serial     Val1      Val2      Val3      
1          21.10     NaN       13.51     
1          43.06     NaN       20.51     
1          32.12     NaN       NaN       
2          NaN       11.20     NaN       
2          NaN       NaN       NaN       
3          45.10     NaN       NaN       
3          14.16     NaN       NaN      
4          NaN       34.90     NaN       
4          NaN       12.12     11.10     
4          NaN       18.09     NaN

These are grouped based on their unique Serial. For example, Serial 1 has values for Val1 and Val3 but I would still prefer to choose values from Val1 for the ['All'] column. In choosing which Val columns to select for ['All'], Val1 if available then Val2 if Val1 is not available... (Val1>Val2>Val3)

Serial     Val1      Val2      Val3      All       Source
1          21.10     NaN       13.51     21.10     Val1
1          43.06     NaN       20.51     43.06
1          32.12     NaN       NaN       32.12
2          NaN       11.20     NaN       11.20     Val2
2          NaN       NaN       NaN       NaN  
3          45.10     NaN       NaN       45.10     Val1
3          14.16     NaN       NaN       14.16
4          NaN       34.90     NaN       34.90     Val2
4          NaN       12.12     11.10     12.12    
4          NaN       18.09     NaN       18.09

Thank you

jezrael · Accepted Answer · 2020-08-31 10:56:04Z

2

You can first backfilling missing values and seelct first val by positions with DataFrame.iloc and for second use same solution like before:

df1 = df[['Val1','Val2','Val3']]

mask = df1.isna().all(axis=1)
mask1 = df['Serial'].duplicated()

df = (df.assign(All = df1.bfill(axis=1).iloc[:, 0],
                Source = df1.notna().idxmax(axis=1).mask(mask1 | mask)))
print (df)
   Serial   Val1   Val2   Val3    All Source
0       1  21.10    NaN  13.51  21.10   Val1
1       1  43.06    NaN  20.51  43.06    NaN
2       1  32.12    NaN    NaN  32.12    NaN
3       2    NaN  11.20    NaN  11.20   Val2
4       2    NaN    NaN    NaN    NaN    NaN
5       3  45.10    NaN    NaN  45.10   Val1
6       3  14.16    NaN    NaN  14.16    NaN
7       4    NaN  34.90    NaN  34.90   Val2
8       4    NaN  12.12  11.10  12.12    NaN
9       4    NaN  18.09    NaN  18.09    NaN

edited Aug 31, 2020 at 10:56

answered Aug 31, 2020 at 10:50

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

kiyas Over a year ago

thank you. i checked and yes it selects Val1 values but when rows such as in Serial 2, it does not select Val2 values but returns NaN value instead?

jezrael Over a year ago

@kiyaserin - Not understand, can you explian more?

kiyas Over a year ago

sorry, wrong Serial earlier. for example, in row Serial 2, it should return Val2 values since Val1 is empty. but when I tried it returns a NaN value?

jezrael Over a year ago

@kiyaserin - Not understand, is possible change data for see it?

Collectives™ on Stack Overflow

Conditional selection of values from other column

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related