pandas combine two columns with null values

Question

I have a df with two columns and I want to combine both columns ignoring the NaN values. The catch is that sometimes both columns have NaN values in which case I want the new column to also have NaN. Here's the example:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df
Out[10]:
foodstuff   type
0   apple-martini   None
1   apple-pie   None
2   None    strawberry-tart
3   None    dessert
4   None    None

I tried to use fillna and solve this :

df['foodstuff'].fillna('') + df['type'].fillna('')

and I got :

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                   
dtype: object

The row 4 has become a blank value. What I want in this situation is a NaN value since both the combining columns are NaNs.

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4            None       
dtype: object

root · Accepted Answer · 2017-01-03 18:05:36Z

82

Use fillna on one column with the fill values being the other column:

df['foodstuff'].fillna(df['type'])

The resulting output:

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4               None

answered Jan 3, 2017 at 18:05

root

34.1k6 gold badges77 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

kilgoretrout Over a year ago

This only works because of the rather unrealistic example provided, in which there's always at least a None per row.

jdeng Over a year ago

@kilgoretrout I find it works even when both columns contain null value

sjd Over a year ago

Is there any option to remove 'type column after fillna in the same line.? ie by avoiding another 'drop` statement

Sudip Adhikari Over a year ago

TypeError: "value" parameter must be a scalar, dict or Series, but you passed a "Series"

sirfz · Accepted Answer · 2017-01-03 18:15:48Z

7

you can use the combine method with a lambda:

df['foodstuff'].combine(df['type'], lambda a, b: ((a or "") + (b or "")) or None, None)

(a or "") returns "" if a is None then the same logic is applied on the concatenation (where the result would be None if the concatenation is an empty string).

answered Jan 3, 2017 at 18:15

sirfz

4,29727 silver badges38 bronze badges

Comments

piRSquared · Accepted Answer · 2017-01-03 18:07:32Z

4

fillna both columns together
sum(1) to add them
replace('', np.nan)

df.fillna('').sum(1).replace('', np.nan)

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                NaN
dtype: object

answered Jan 3, 2017 at 18:07

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Comments

keepAlive · Accepted Answer · 2021-05-17 06:42:47Z

3

If you deal with columns that contain something where the others don't and vice-versa, a one-liner that does well the job is

>>> df.rename(columns={'type': 'foodstuff'}).stack().unstack()
         foodstuff
0    apple-martini
1        apple-pie
2  strawberry-tart
3          dessert

... which solution also generalises well if you have multiple columns to "intricate", as long as you can define your ~.rename mapping. The intention behind such renaming is to create duplicates that ~.stack().unstack() will then process for you.

As explained, this solution only suits configuration with orthogonal columns, i.e. columns that never are simultaneously valued.

edited May 17, 2021 at 6:42

answered May 15, 2021 at 13:21

keepAlive

6,7155 gold badges29 silver badges43 bronze badges

1 Comment

Despe1990 Over a year ago

In my case pd.DataFrame.stack() does the column combination. unstack uncombines them.

Vikash Singh · Accepted Answer · 2017-01-03 18:02:24Z

2

You can always fill the empty string in the new column with None

import numpy as np

df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)

Complete code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 'type':[None, None, 'strawberry-tart', 'dessert', None]})

df['new_col'] = df['foodstuff'].fillna('') + df['type'].fillna('')

df['new_col'].replace(r'^\s*$', np.nan, regex=True, inplace=True)

df

output:

    foodstuff   type    new_col
0   apple-martini   None    apple-martini
1   apple-pie   None    apple-pie
2   None    strawberry-tart strawberry-tart
3   None    dessert dessert
4   None    None    NaN

answered Jan 3, 2017 at 18:02

Vikash Singh

14.1k9 gold badges45 silver badges73 bronze badges

1 Comment

mirekphd Over a year ago

A general solution should provide also zero replacement values for numeric data types (.fillna(default_str_or_val))

rachwa · Accepted Answer · 2022-06-18 21:22:02Z

1

With combine_first you can fill null values in one column with non-null values from another column:

In [3]: df['foodstuff'].combine_first(df['type'])
Out[3]: 
0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4               None

answered Jun 18, 2022 at 21:22

rachwa

2,3901 gold badge21 silver badges20 bronze badges

Comments

Sabito · Accepted Answer · 2021-01-15 02:37:13Z

0

We can make this problem even more complete and have a universal solution for this type of problem.

The key things in there are that we wish to join a group of columns together but just ignore NaNs.

Here is my answer:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', None, None, None], 
               'type':[None, None, 'strawberry-tart', 'dessert', None],
              'type1':[98324, None, None, 'banan', None],
              'type2':[3, None, 'strawberry-tart', np.nan, None]})

df=df.fillna("NAN")
df=df.astype('str')
df["output"] = df[['foodstuff', 'type', 'type1', 'type2']].agg(', '.join, axis=1)
df['output'] = df['output'].str.replace('NAN, ', '')
df['output'] = df['output'].str.replace(', NAN', '')

edited Jan 15, 2021 at 2:37

Sabito

5,23610 gold badges39 silver badges66 bronze badges

answered Jan 15, 2021 at 2:21

Sway Wu

3893 silver badges8 bronze badges

Comments

rachwa · Accepted Answer · 2022-06-20 09:52:57Z

You can replace the non zero values with column names like:

df1= df.replace(1, pd.Series(df.columns, df.columns))

Afterwards, replace 0's with empty string and then merge the columns like below:

f = f.replace(0, '')
f['new'] = f.First+f.Second+f.Three+f.Four

Refer the full code below:

import pandas as pd
df = pd.DataFrame({'Second':[0,1,0,0],'First':[1,0,0,0],'Three':[0,0,1,0],'Four':[0,0,0,1], 'cl': ['3D', 'Wireless','Accounting','cisco']})
df2=pd.DataFrame({'pi':['Accounting','cisco','3D','Wireless']})
df1= df.replace(1, pd.Series(df.columns, df.columns))
f = pd.merge(df1,df2,how='right',left_on=['cl'],right_on=['pi'])
f = f.replace(0, '')
f['new'] = f.First+f.Second+f.Three+f.Four

df1:

In [3]: df1                                                                                                                                                                              
Out[3]: 
   Second  First  Three  Four          cl
0       0  First      0     0          3D
1  Second      0      0     0    Wireless
2       0      0  Three     0  Accounting
3       0      0      0  Four       cisco

df2:

In [4]: df2                                                                                                                                                                              
Out[4]: 
           pi
0  Accounting
1       cisco
2          3D
3    Wireless

Final DataFrame f will be:

In [2]: f                                                                                                                                                                                
Out[2]: 
   Second  First  Three  Four          cl          pi     new
0          First                       3D          3D   First
1  Second                        Wireless    Wireless  Second
2                 Three        Accounting  Accounting   Three
3                        Four       cisco       cisco    Four

KBurchfiel · Accepted Answer · 2024-03-26 16:37:01Z

If you initialize your DataFrame with NaNs for missing values rather than None, you can use Series.add() to fill NaN values on the fly when adding the columns together.

Example:

df = pd.DataFrame({'foodstuff':['apple-martini', 'apple-pie', np.NaN, np.NaN, np.NaN], 
'type':[np.NaN, np.NaN, 'strawberry-tart', 'dessert', np.NaN]})

df['foodstuff'].add(df['type'], fill_value = '')

Result:

0      apple-martini
1          apple-pie
2    strawberry-tart
3            dessert
4                NaN

This also works nicely for adding numerical columns that have some NaN values, as it allows you to add a number to a NaN value and get the number. Example:

df_test_nums = pd.DataFrame({'left_numbers':[1, 1, np.NaN, 3.7, 2.4], 
'right_numbers':[4, np.NaN, np.NaN, 2.7, 9.4]})
print(df_test_nums)

Result:

   left_numbers  right_numbers
0           1.0            4.0
1           1.0            NaN
2           NaN            NaN
3           3.7            2.7
4           2.4            9.4

Adding these columns together so that the sum of a number and a NaN value will be the number:

df_test_nums['left_numbers'].add(
    df_test_nums['right_numbers'], fill_value = 0)

Result:

Compare this to the use of the + operator, which converts the sum of NaN and a number into NaN:

df_test_nums['left_numbers'] + df_test_nums['right_numbers']

Result:

For operations that involve multiple columns, a more elegant approach is available via df.sum().

print(df_test_nums[
          ['left_numbers', 'right_numbers']].sum(
              axis=1, min_count = 1))

output:

Note that, if min_count is set to 0 (the default), the 3rd row will equal 0, since that's the default output when values consisting only of NaNs are added together. (See the df.sum() documentation for more information.)

Collectives™ on Stack Overflow

pandas combine two columns with null values

9 Answers 9

4 Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

4 Comments

Comments

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related