Missing values in Pandas DataFrame are always empty when written to CSV

Question

For this example, I am using version 1.0.1 of pandas.

I have a DataFrame with mixed types and some missing values:

df = pd.DataFrame(
    [
        [1, 2.0, '2020-01-01', 'A String']
    ], columns = ['int', 'float', 'datetime', 'str']
)
df.loc[1] = [pd.NA, pd.NA, pd.NA, pd.NA]
df.datetime = pd.to_datetime(df.datetime)
print(df)

int   float   datetime    str
0 1   2.0 2020-01-01  A String
1 <NA>    NaN NaT NaN

Let's print the types of the DataFrame to make sure they are what I expect:

print(df.dtypes)

int                 object
float              float64
datetime    datetime64[ns]
str                 object
dtype: object

Now, I want to write this DataFrame to a CSV file:

df.to_csv('test.csv', index=False)

Looking at the output CSV, all NaN values are replaced with an empty string. I guess that this is fine for string columns, but it's not exactly optimal for int, float or datetime columns.

How can I get column-specific representations of the missing values?

EDIT: It is indeed possible to automatically fill missing values using the na_rep argument: df.to_csv('test.csv', na_rep='NA'). However, it does not allow column-specific representations.

SOLUTION: I guess the best solution so far is to call fillna with a dict before writing to CSV:

df.fillna(
    {'int': '<NA>', 'float': 'NaN', 'datetime': 'NaT'}
).to_csv('test.csv', index=False)

Gnudiff · Accepted Answer · 2020-03-05 12:09:45Z

1

There is no specific CSV format that specifies what the values should be. There are a couple of conventions, but ultimately it is down to the program which will use csv afterwards.

Therefore you should use Pandas fillna function to supply what you want for each data type, before exporting.

answered Mar 5, 2020 at 12:09

Gnudiff

4,3251 gold badge26 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Antonio Gagliostro · Accepted Answer · 2020-03-05 12:08:35Z

1

Try this:

df.to_csv('test.csv', index=False,na_rep='NA')

answered Mar 5, 2020 at 12:08

Antonio Gagliostro

473 bronze badges

1 Comment

dremok Over a year ago

Thanks! The problem with this is that it fills the missing values for each column with the same value.

Shahir Ansari · Accepted Answer · 2020-03-05 12:57:35Z

1

You can use fillna() for specific columns to get what value you want.For example

df['int column'].fillna(0)
df['string column'].fillna("NA")

answered Mar 5, 2020 at 12:57

Shahir Ansari

1,86818 silver badges21 bronze badges

Collectives™ on Stack Overflow

Missing values in Pandas DataFrame are always empty when written to CSV

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related