ingesting gzip file from s3 to Postgres invalid byte sequence for encoding "UTF8"

Question

I have a data.csv.gz file in S3 that contains special characters in some rows: the file also has no headers but I've created columns names in the Postgres table. When I check under metadata in S3, the type is Content-Type: binary/octet-stream

This is the error I'm seeing:

psycopg2.errors.InternalError_: invalid byte sequence for encoding "UTF8": 0x8b

This is what I'm doing that's creating the error:

SELECT aws_s3.table_import_from_s3(
'btr.Ats_20210304',
'ID,NAME,WEBSITE,TYPE,CATEGORY,SUB_CATEGORY,PARENT_ACCOUNT',
'(FORMAT csv, HEADER true, DELIMITER ",")',
'vdw-dev',
'date/hourly/data_0_0_0.csv.gz',
'us-east-1');

I've checked the postgres table's encoding using SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = 'my_db'; and it's set to UTF8.

See here Gzip

Adrian Klaver
– Adrian Klaver

2021-03-10 17:53:34 +00:00
Commented Mar 10, 2021 at 17:53 — Adrian Klaver
– Adrian Klaver, Commented Mar 10, 2021 at 17:53

Datageek · Accepted Answer · 2022-04-13 20:46:34Z

2

See the documentation from AWS: Importing an Amazon S3 compressed (gzip) file

You need to ensure that the S3 file has the following Amazon S3 metadata:

Key: Content-Encoding
Value: gzip

answered Apr 13, 2022 at 20:46

Datageek

26.9k6 gold badges70 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

ingesting gzip file from s3 to Postgres invalid byte sequence for encoding "UTF8"

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related