Pandas Read SQL to CSV - Int conversion

Question

I have data from a local Postgres database that I need to upload to S3 and then copy into Redshift.

To accomplish this, I am using Python Pandas as follows:

engine = create_engine(self.engine)
connection = engine.raw_connection()
df = pd.read_sql(<sql string>, connection, coerce_float=False)
df.to_csv(<output fn>, header=True, index=False, encoding='utf-8')

The sql that it executes returns rows of varchar(255), varchar(255), int, int. However, since some of these values can be null I run into the Pandas Caveat about integers and NaN values.

This post solves their issue by setting na to an arbitrary int and the astype explicitly as int. They are able to do so because all their columns are of the same datatype. However, I have a mix of varchar(255) and int.

How can I force df.to_csv(...) to output my ints as ints (and not floats)?

Thanks,

Kartik · Accepted Answer · 2015-11-05 20:43:12Z

0

Why can't you insert these two lines above df.to_csv()?

df = df.fillna(<some_int>)
df[[int_col1, int_col2]] = df[[int_col1, int_col2]].astype('int64')

answered Nov 5, 2015 at 20:43

Kartik

8,73345 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

dlstadther Over a year ago

I want nulls to stay nulls. I do not want them to be an arbitrary int. If so, Redshift's copy will insert the arbitrary int as that record's field value.

Kartik Over a year ago

What is Redshift's null identifier? If it is like '\N' in SQL, you can convert the column to varchar and replace all null values with '\\N'...

dlstadther Over a year ago

The Redshift copy command allows for BLANKASNULL and EMPTYASNULL. Fields in redshift cannot have their datatype changed. Rather, you must create a new column of the datatype you want and then do update <table> set <column> = null where <column> = <arbitrary char>;. This could work, but I'd really like a less involved solution if one exists. Thank you though! If nothing else better shows up in the next few days, I'll implement this and mark as the solution.

Kartik Over a year ago

I am assuming that having int converted to float is a problem as well. I too don't see any other solution apart from what you have said. That doesn't mean some other user might not have a better idea. Poke around a bit, something might pop up.

Collectives™ on Stack Overflow

Pandas Read SQL to CSV - Int conversion

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related