How to replace None only with empty string using pandas?

Question

the code below generates a df:

import pandas as pd
from datetime import datetime as dt
import numpy as np

dates = [dt(2014, 1, 2, 2), dt(2014, 1, 2, 3), dt(2014, 1, 2, 4), None]
strings1 = ['A', 'B',None, 'C']
strings2 = [None, 'B','C', 'C']
strings3 = ['A', 'B','C', None]
vals = [1.,2.,np.nan, 4.]
df = pd.DataFrame(dict(zip(['A','B','C','D','E'],
                           [strings1, dates, strings2, strings3, vals])))



+---+------+---------------------+------+------+-----+
|   |  A   |          B          |  C   |  D   |  E  |
+---+------+---------------------+------+------+-----+
| 0 | A    | 2014-01-02 02:00:00 | None | A    | 1   |
| 1 | B    | 2014-01-02 03:00:00 | B    | B    | 2   |
| 2 | None | 2014-01-02 04:00:00 | C    | C    | NaN |
| 3 | C    | NaT                 | C    | None | 4   |
+---+------+---------------------+------+------+-----+

I would like to replace all None (real None in python, not str) inside with ''(empty string).

The expected df is

+---+---+---------------------+---+---+-----+
|   | A |          B          | C | D |  E  |
+---+---+---------------------+---+---+-----+
| 0 | A | 2014-01-02 02:00:00 |   | A | 1   |
| 1 | B | 2014-01-02 03:00:00 | B | B | 2   |
| 2 |   | 2014-01-02 04:00:00 | C | C | NaN |
| 3 | C | NaT                 | C |   | 4   |
+---+---+---------------------+---+---+-----+

what I did is

df = df.replace([None], [''], regex=True)

But I got

+---+---+---------------------+---+------+---+
|   | A |          B          | C |  D   | E |
+---+---+---------------------+---+------+---+
| 0 | A | 1388628000000000000 |   | A    | 1 |
| 1 | B | 1388631600000000000 | B | B    | 2 |
| 2 |   | 1388635200000000000 | C | C    |   |
| 3 | C |                     | C |      | 4 |
+---+---+---------------------+---+------+---+

all the dates becomes big numbers
Even NaT and NaN are replaced, which I don't want.

How can I achieve that correctly and efficently?

GeoStoneMarten · Accepted Answer · 2021-07-26 17:29:34Z

39

This is sufficient

df.fillna("",inplace=True)

df
Out[142]: 
   A                    B  C  D  E
0  A  2014-01-02 02:00:00     A  1
1  B  2014-01-02 03:00:00  B  B  2
2     2014-01-02 04:00:00  C  C   
3  C                       C     4

edit 2021-07-26 complete response following @dWitty's comment

If you really want to keep Nat and NaN values on other than text, you just need fill Na for your text column In your exemple this is A, C, D

You just send a dict of replacement value for your columns. value can be differents for each column. For your case you just need construct the dict

# default values to replace NA (None)
# values = {"A": "", "C": "", "D": ""}
values = (dict([[e,""] for e in ['A','C','D']]))
df.fillna(value=values, inplace=True)

df
Out[142]: 
   A                   B  C  D    E
0  A 2014-01-02 02:00:00     A  1.0
1  B 2014-01-02 03:00:00  B  B  2.0
2    2014-01-02 04:00:00  C  C  NaN
3  C                 NaT  C     4.0

edited Jul 26, 2021 at 17:29

answered Aug 7, 2020 at 8:18

GeoStoneMarten

5831 gold badge7 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

dWitty Over a year ago

the question explicitly asks about replacing None only, not also any NaN

EdChum · Accepted Answer · 2015-07-08 15:11:00Z

17

It looks like None is being promoted to NaN and so you cannot use replace like usual, the following works:

In [126]:
mask = df.applymap(lambda x: x is None)
cols = df.columns[(mask).any()]
for col in df[cols]:
    df.loc[mask[col], col] = ''
df

Out[126]:
   A                   B  C  D   E
0  A 2014-01-02 02:00:00     A   1
1  B 2014-01-02 03:00:00  B  B   2
2    2014-01-02 04:00:00  C  C NaN
3  C                 NaT  C      4

So we generate a mask of the None values using applymap, we then use this mask to iterate over each column of interest and using the boolean mask set the values.

edited Jul 8, 2015 at 15:11

answered Jul 8, 2015 at 14:34

EdChum

397k204 gold badges836 silver badges583 bronze badges

Comments

Ricky McMaster · Accepted Answer · 2017-02-16 10:10:37Z

6

Since the relevant columns you wish to alter are all objects, you could just specify this with the dtype attribute (for completeness I added in string and unicode) and use fillna.

So:

for c in df:
   if str(df[c].dtype) in ('object', 'string_', 'unicode_'):
        df[c].fillna(value='', inplace=True)

This will leave numeric and date columns unaffected.

To see the data types for all columns:

df.dtypes

edited Feb 16, 2017 at 10:10

answered Feb 16, 2017 at 9:45

Ricky McMaster

4,6772 gold badges26 silver badges25 bronze badges

Comments

Gulzar · Accepted Answer · 2020-11-17 13:21:39Z

3

For those who are trying to replace None, and not just np.nan (which is covered in here)

default_value = ""
df.apply(lambda x: x if x is not None else default_value)

here is a nice one-liner

answered Nov 17, 2020 at 13:21

Gulzar

28.7k42 gold badges158 silver badges260 bronze badges

Comments

drorhun · Accepted Answer · 2022-07-12 17:29:00Z

3

You can use replace method. It works properly.

df.replace('', np.nan, inplace=True)

Source

answered Jul 12, 2022 at 17:29

drorhun

6041 gold badge11 silver badges23 bronze badges

Collectives™ on Stack Overflow

How to replace None only with empty string using pandas?

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related