6

I have a pandas dataframe with a "year" column. However some rows have a np.NaN value due to an outer merge. The data type of the column in pandas is therefore converted to float64 instead of integer (integer cannot store NaNs?). Next, I want to store the dataframe on a postGreSQL database. For this I use:

df.to_sql()

Everything works fine but my postGreSQL column is now type "double precision" and the np.NaN values are now [null]. This all makes sense since the input column type was float64 and not integer type.

I was wondering if there is a way to store the results in an integer type column with [nans].

Example Notebook

Result of Ami's answer:

enter image description here

6
  • Try df.astype(object).to_sql() and try again? Commented May 18, 2018 at 14:44
  • @coldspeed that changes the table schema - not sure it's warranted. Commented May 18, 2018 at 14:51
  • @AmiTavory If the schema is already defined, then I don't think so. By the way, fillna will not downcast... :) Commented May 18, 2018 at 14:52
  • @coldspeed Ooh, excellent point - missed that. Will update. thanks! Commented May 18, 2018 at 14:53
  • @coldspeed, the result is still double precision in postgreSQL. I've added a notebook to my question to check if my implememtation is wrong. Commented May 18, 2018 at 15:10

2 Answers 2

6

(integer cannot store NaNs?)

No, they cannot. If you look at the postgresql numeric documentation, you can see that the number of bytes, and ranges, are completely specified, and integers cannot store this.

A common solution in this case is to decide, by convention, that some number is logically a nan. In your case, if it is year, you might choose a negative value (or just -1) as that. Before writing, you could use

df.year = df.year.fillna(-1).astype(int)

Alternatively, you can define another column as year_is_none.

Alternatively, you can store them as floats.

These solutions range from most efficient, to least efficient in terms of memory.

Sign up to request clarification or add additional context in comments.

1 Comment

Alternatively, if you use NUMERIC data type, you can store Infinity, -Infinity as well as NaN values in the database.
1

You should use it;

df.year = df.year.fillna(-1) OR 0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.