I have data from a local Postgres database that I need to upload to S3 and then copy into Redshift.
To accomplish this, I am using Python Pandas as follows:
engine = create_engine(self.engine)
connection = engine.raw_connection()
df = pd.read_sql(<sql string>, connection, coerce_float=False)
df.to_csv(<output fn>, header=True, index=False, encoding='utf-8')
The sql that it executes returns rows of varchar(255), varchar(255), int, int. However, since some of these values can be null I run into the Pandas Caveat about integers and NaN values.
This post solves their issue by setting na to an arbitrary int and the astype explicitly as int. They are able to do so because all their columns are of the same datatype. However, I have a mix of varchar(255) and int.
How can I force df.to_csv(...) to output my ints as ints (and not floats)?
Thanks,