5

I have a data.csv.gz file in S3 that contains special characters in some rows: ( the file also has no headers but I've created columns names in the Postgres table. When I check under metadata in S3, the type is Content-Type: binary/octet-stream

This is the error I'm seeing:

psycopg2.errors.InternalError_: invalid byte sequence for encoding "UTF8": 0x8b

This is what I'm doing that's creating the error:

SELECT aws_s3.table_import_from_s3(
'btr.Ats_20210304',
'ID,NAME,WEBSITE,TYPE,CATEGORY,SUB_CATEGORY,PARENT_ACCOUNT',
'(FORMAT csv, HEADER true, DELIMITER ",")',
'vdw-dev',
'date/hourly/data_0_0_0.csv.gz',
'us-east-1');

I've checked the postgres table's encoding using SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = 'my_db'; and it's set to UTF8.

1
  • See here Gzip Commented Mar 10, 2021 at 17:53

1 Answer 1

2

See the documentation from AWS: Importing an Amazon S3 compressed (gzip) file

You need to ensure that the S3 file has the following Amazon S3 metadata:

  • Key: Content-Encoding
  • Value: gzip
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.