0

Continuation to my previous question

Some of the rows in my dataset are messy

Serial     Val1      Val2      Val3      
1          21.10     NaN       13.51     
1          43.06     NaN       20.51     
1          32.12     NaN       NaN       
2          NaN       11.20     NaN       
2          NaN       NaN       NaN       
3          45.10     NaN       NaN       
3          14.16     NaN       NaN      
4          NaN       34.90     NaN       
4          NaN       12.12     11.10     
4          NaN       18.09     NaN       

These are grouped based on their unique Serial. For example, Serial 1 has values for Val1 and Val3 but I would still prefer to choose values from Val1 for the ['All'] column. In choosing which Val columns to select for ['All'], Val1 if available then Val2 if Val1 is not available... (Val1>Val2>Val3)

Serial     Val1      Val2      Val3      All       Source
1          21.10     NaN       13.51     21.10     Val1
1          43.06     NaN       20.51     43.06
1          32.12     NaN       NaN       32.12
2          NaN       11.20     NaN       11.20     Val2
2          NaN       NaN       NaN       NaN  
3          45.10     NaN       NaN       45.10     Val1
3          14.16     NaN       NaN       14.16
4          NaN       34.90     NaN       34.90     Val2
4          NaN       12.12     11.10     12.12    
4          NaN       18.09     NaN       18.09     

Thank you

1 Answer 1

2

You can first backfilling missing values and seelct first val by positions with DataFrame.iloc and for second use same solution like before:

df1 = df[['Val1','Val2','Val3']]

mask = df1.isna().all(axis=1)
mask1 = df['Serial'].duplicated()

df = (df.assign(All = df1.bfill(axis=1).iloc[:, 0],
                Source = df1.notna().idxmax(axis=1).mask(mask1 | mask)))
print (df)
   Serial   Val1   Val2   Val3    All Source
0       1  21.10    NaN  13.51  21.10   Val1
1       1  43.06    NaN  20.51  43.06    NaN
2       1  32.12    NaN    NaN  32.12    NaN
3       2    NaN  11.20    NaN  11.20   Val2
4       2    NaN    NaN    NaN    NaN    NaN
5       3  45.10    NaN    NaN  45.10   Val1
6       3  14.16    NaN    NaN  14.16    NaN
7       4    NaN  34.90    NaN  34.90   Val2
8       4    NaN  12.12  11.10  12.12    NaN
9       4    NaN  18.09    NaN  18.09    NaN
Sign up to request clarification or add additional context in comments.

4 Comments

thank you. i checked and yes it selects Val1 values but when rows such as in Serial 2, it does not select Val2 values but returns NaN value instead?
@kiyaserin - Not understand, can you explian more?
sorry, wrong Serial earlier. for example, in row Serial 2, it should return Val2 values since Val1 is empty. but when I tried it returns a NaN value?
@kiyaserin - Not understand, is possible change data for see it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.