0

I have a dataset 'df' like so:

a     b     c    e  f  g
1     Nan   Nan  2  3  4
Nan   1     Nan  2  3  4
Nan   Nan   1    2  3  4

I want to create a new column 'd' in my dataframe 'df' like so:

   a     b     c    d  e  f  g 
   1     Nan   Nan  1  2  3  4
   Nan   1     Nan  1  2  3  4
   Nan   Nan   1    1  2  3  4

So basically, 'd' is the amalgamation of columns 'a', 'b', and 'c' for any non-NaN values those three columns have. Between 'a', 'b', and 'c', if one column is non-NaN, the other columns are NaN. I need columns 'e', 'f', and 'g' in my dataset, but they do not have anything to do with constructing column 'd'

I created this function below:

def func(df):
    if df['a'] is not None:
        return df['a']
    if df['b'] is not None:
        return df['b']
    if df['c'] is not None:
        return df['c']

and then used concatenated_df['d']=concatenated_df.apply(func, axis=1) to apply the conditional statements for each row of the dataframe. However, I am only getting the first if statement to successfully execute. The rest returns NaN values to the new column 'd'

I also tried modifying the function to

def func(df):
    if df['a'] is not None:
        return df['a']
    elif df['b'] is not None:
        return df['b']
    elif df['c'] is not None:
        return df['c']

but it still only executed the first conditional statement

5
  • So you just want the first value that isn't None from a b and c? Commented Jul 9, 2018 at 18:51
  • I will update the question with an example to make it more clear Commented Jul 9, 2018 at 18:51
  • 1
    Are you looking for not-None or not-NaN? (They are not the same.) Commented Jul 9, 2018 at 18:51
  • @DyZ maybe that is my issue let me try fixing that Commented Jul 9, 2018 at 18:53
  • Possible duplicate of stackoverflow.com/questions/31828240/… Commented Jul 9, 2018 at 19:02

2 Answers 2

2

You can use first_valid_index

df['d'] = df.apply(lambda x: x[x.first_valid_index()], axis=1)

     a    b    c  d
0    1  NaN  NaN  1
1  NaN    1  NaN  1
2  NaN  NaN    1  1
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for this answer - I updated my question to be more specific - there are columns in my dataset that I don't want to use for constructing column 'd' (I only want to use columns 'a', 'b', and 'c'), so I think the other answer is better for my purposes. Thanks for your help though!
If you'd like to apply this to a subset, you can just index your dataframe first: df[['a', 'b', 'c']].apply(lambda x: x[x.first_valid_index()], axis=1)
1

One more option not mentioned in the duplicated question:

df['d'] = df['a'].combine_first(df['b']).combine_first(df['c'])
#0    1.0
#1    1.0
#2    1.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.