50

I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new column to also have NaN. Here's the example:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df
Out[10]:
foodstuff   type
0   apple-martini   None
1   apple-pie   None
2   None    strawberry-tart
3   None    dessert
4   None    None

I tried to use fillna and solve this :

df['foodstuff'].fillna('') + df['type'].fillna('')

and I got :

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                   
dtype: object

The row 4 has become a blank value. What I want in this situation is a NaN value since both the combining columns are NaNs.

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4            None       
dtype: object

9 Answers 9

82

Use fillna on one column with the fill values being the other column:

df['foodstuff'].fillna(df['type'])

The resulting output:

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4               None
Sign up to request clarification or add additional context in comments.

4 Comments

This only works because of the rather unrealistic example provided, in which there's always at least a None per row.
@kilgoretrout I find it works even when both columns contain null value
Is there any option to remove 'type column after fillna in the same line.? ie by avoiding another 'drop` statement
TypeError: "value" parameter must be a scalar, dict or Series, but you passed a "Series"
7

you can use the combine method with a lambda:

df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None)

(a or "") returns "" if a is None then the same logic is applied on the concatenation (where the result would be None if the concatenation is an empty string).

Comments

4
  • fillna both columns together
  • sum(1) to add them
  • replace('', np.nan)

df.fillna('').sum(1).replace('', np.nan)

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                NaN
dtype: object

Comments

3

If you deal with columns that contain something where the others don't and vice-versa, a one-liner that does well the job is

>>> df.rename(columns={'type': 'foodstuff'}).stack().unstack()
         foodstuff
0    apple-martini
1        apple-pie
2  strawberry-tart
3          dessert

... which solution also generalises well if you have multiple columns to "intricate", as long as you can define your ~.rename mapping. The intention behind such renaming is to create duplicates that ~.stack().unstack() will then process for you.

As explained, this solution only suits configuration with orthogonal columns, i.e. columns that never are simultaneously valued.

1 Comment

In my case pd.DataFrame.stack() does the column combination. unstack uncombines them.
2

You can always fill the empty string in the new column with None

import numpy as np

df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)

Complete code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df['new_col'] = df['foodstuff'].fillna('') + df['type'].fillna('')

df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)

df

output:

    foodstuff   type    new_col
0   apple-martini   None    apple-martini
1   apple-pie   None    apple-pie
2   None    strawberry-tart strawberry-tart
3   None    dessert dessert
4   None    None    NaN

1 Comment

A general solution should provide also zero replacement values for numeric data types (.fillna(default_str_or_val))
1

With combine_first you can fill null values in one column with non-null values from another column:

In [3]: df['foodstuff'].combine_first(df['type'])
Out[3]: 
0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4               None

Comments

0

We can make this problem even more complete and have a universal solution for this type of problem.

The key things in there are that we wish to join a group of columns together but just ignore NaNs.

Here is my answer:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 
               'type':[None, None, 'strawberry-tart', 'dessert', None],
              'type1':[98324, None, None, 'banan', None],
              'type2':[3, None, 'strawberry-tart', np.nan, None]})

enter image description here

df=df.fillna("NAN")
df=df.astype('str')
df["output"] = df[['foodstuff', 'type', 'type1', 'type2']].agg(', '.join, axis=1)
df['output'] = df['output'].str.replace('NAN, ', '')
df['output'] = df['output'].str.replace(', NAN', '')

enter image description here

Comments

0

You can replace the non zero values with column names like:

df1= df.replace(1, pd.Series(df.columns, df.columns))

Afterwards, replace 0's with empty string and then merge the columns like below:

f = f.replace(0, '')
f['new'] = f.First+f.Second+f.Three+f.Four

Refer the full code below:

import pandas as pd
df = pd.DataFrame({'Second':[0,1,0,0],'First':[1,0,0,0],'Three':[0,0,1,0],'Four':[0,0,0,1], 'cl': ['3D', 'Wireless','Accounting','cisco']})
df2=pd.DataFrame({'pi':['Accounting','cisco','3D','Wireless']})
df1= df.replace(1, pd.Series(df.columns, df.columns))
f = pd.merge(df1,df2,how='right',left_on=['cl'],right_on=['pi'])
f = f.replace(0, '')
f['new'] = f.First+f.Second+f.Three+f.Four

df1:

In [3]: df1                                                                                                                                                                              
Out[3]: 
   Second  First  Three  Four          cl
0       0  First      0     0          3D
1  Second      0      0     0    Wireless
2       0      0  Three     0  Accounting
3       0      0      0  Four       cisco

df2:

In [4]: df2                                                                                                                                                                              
Out[4]: 
           pi
0  Accounting
1       cisco
2          3D
3    Wireless

Final DataFrame f will be:

In [2]: f                                                                                                                                                                                
Out[2]: 
   Second  First  Three  Four          cl          pi     new
0          First                       3D          3D   First
1  Second                        Wireless    Wireless  Second
2                 Three        Accounting  Accounting   Three
3                        Four       cisco       cisco    Four

Comments

0

If you initialize your DataFrame with NaNs for missing values rather than None, you can use Series.add() to fill NaN values on the fly when adding the columns together.

Example:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', np.NaN, np.NaN, np.NaN], 
'type':[np.NaN, np.NaN, 'strawberry-tart', 'dessert', np.NaN]})

df['foodstuff'].add(df['type'], fill_value = '')

Result:

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                NaN

This also works nicely for adding numerical columns that have some NaN values, as it allows you to add a number to a NaN value and get the number. Example:

df_test_nums = pd.DataFrame({'left_numbers':[1, 1, np.NaN, 3.7, 2.4], 
'right_numbers':[4, np.NaN, np.NaN, 2.7, 9.4]})
print(df_test_nums)

Result:

   left_numbers  right_numbers
0           1.0            4.0
1           1.0            NaN
2           NaN            NaN
3           3.7            2.7
4           2.4            9.4

Adding these columns together so that the sum of a number and a NaN value will be the number:

df_test_nums['left_numbers'].add(
    df_test_nums['right_numbers'], fill_value = 0)

Result:

0     5.0
1     1.0
2     NaN
3     6.4
4    11.8

Compare this to the use of the + operator, which converts the sum of NaN and a number into NaN:

df_test_nums['left_numbers'] + df_test_nums['right_numbers']

Result:

0     5.0
1     NaN
2     NaN
3     6.4
4    11.8

For operations that involve multiple columns, a more elegant approach is available via df.sum().

print(df_test_nums[
          ['left_numbers', 'right_numbers']].sum(
              axis=1, min_count = 1))

output:

0     5.0
1     1.0
2     NaN
3     6.4
4    11.8

Note that, if min_count is set to 0 (the default), the 3rd row will equal 0, since that's the default output when values consisting only of NaNs are added together. (See the df.sum() documentation for more information.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.