4

I have the following line

df = pandas.read_sql_query(sql = sql_script, con=conn, coerce_float = False)

that pulls data from Postgres using a sql script. Pandas keeps setting some of the columns to type float64. They should be just int. These columns contain some null values. Is there a ways to pull the data without having Pandas setting them to float64?

Thanks!

3
  • No. Question has been asked a bunch of times before. In Python, null representation comes from Numpy, and Numpy uses float to store null values (makes sense, if you read it up). So there is nothing that represents null in int. By the way, how does it matter int or float? They will both compute the same (in fact, precision will be better maintained in float). Commented Sep 15, 2016 at 5:21
  • Thanks @Kartik for the info. Those are keys from a left join and I want to use them to create a comma separated string in another query. I was baffled by it since I was coming from R and it doesn't do this cast when I run d <- dbGetQuery(conPostgres, postgresQuery.sql) Commented Sep 15, 2016 at 15:54
  • "By the way, how does it matter int or float? They will both compute the same (in fact, precision will be better maintained in float)." float64 can only exactly represent integers up to 2^53 Commented Jan 4, 2023 at 7:56

1 Answer 1

5

As per the documentation, the lack of NA representation in Numpy implies integer NA values can't be managed, so pandas promotes int columns into float.

Sign up to request clarification or add additional context in comments.

1 Comment

Ahhh, sneaky. I was never aware of this. In light of this, I shall remove my suggestion to call df.astype(np.int32) in the above comment.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.