For this example, I am using version 1.0.1 of pandas.
I have a DataFrame with mixed types and some missing values:
df = pd.DataFrame(
[
[1, 2.0, '2020-01-01', 'A String']
], columns = ['int', 'float', 'datetime', 'str']
)
df.loc[1] = [pd.NA, pd.NA, pd.NA, pd.NA]
df.datetime = pd.to_datetime(df.datetime)
print(df)
int float datetime str 0 1 2.0 2020-01-01 A String 1 <NA> NaN NaT NaN
Let's print the types of the DataFrame to make sure they are what I expect:
print(df.dtypes)
int object float float64 datetime datetime64[ns] str object dtype: object
Now, I want to write this DataFrame to a CSV file:
df.to_csv('test.csv', index=False)
Looking at the output CSV, all NaN values are replaced with an empty string. I guess that this is fine for string columns, but it's not exactly optimal for int, float or datetime columns.
How can I get column-specific representations of the missing values?
EDIT: It is indeed possible to automatically fill missing values using the na_rep argument: df.to_csv('test.csv', na_rep='NA'). However, it does not allow column-specific representations.
SOLUTION: I guess the best solution so far is to call fillna with a dict before writing to CSV:
df.fillna(
{'int': '<NA>', 'float': 'NaN', 'datetime': 'NaT'}
).to_csv('test.csv', index=False)