AWS S3 Postgres Extension `ERROR: invalid byte sequence for encoding "UTF8": 0x8b`

Question

save Spark DataFrame to S3 as CSV with GZip compression

(
    df.write
    .option("header", True)
    .option("encoding", "UTF-8")
    .mode(mode)
    .csv(s3_uri, compression=compression))

set tag Content-Encoding to gzip

execute postgres extension to COPY from S3 to a table

SELECT aws_s3.table_import_from_s3(
  'public.mytable1', '', '(format csv, header true)',
  aws_commons.create_s3_uri('my-bucket-1', 'my/object/key/part-00000-...-1-c000.csv', 'us-east-1')
);

123 · Accepted Answer · 2022-11-14 23:29:59Z

4

AWS documentation gives incorrect instructions to set object metadata. if you manually set metadata, it will simply treat your tag as an arbitrary string instead of recognizing Content-Encoding as a reserved keyword.

The default metadata behavior will cause error:

Force system-defined tags (rather than the default User-defined tags):

Wasted hours and 4 peoples' time on this. Feedback to AWS docs team has been submitted.

answered Nov 14, 2022 at 23:29

123

87310 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

AWS S3 Postgres Extension `ERROR: invalid byte sequence for encoding "UTF8": 0x8b`

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related