2
  1. save Spark DataFrame to S3 as CSV with GZip compression
    (
        df.write
        .option("header", True)
        .option("encoding", "UTF-8")
        .mode(mode)
        .csv(s3_uri, compression=compression))
    
  2. set tag Content-Encoding to gzip enter image description here
  3. execute postgres extension to COPY from S3 to a table
    SELECT aws_s3.table_import_from_s3(
      'public.mytable1', '', '(format csv, header true)',
      aws_commons.create_s3_uri('my-bucket-1', 'my/object/key/part-00000-...-1-c000.csv', 'us-east-1')
    );
    

1 Answer 1

4

AWS documentation gives incorrect instructions to set object metadata. if you manually set metadata, it will simply treat your tag as an arbitrary string instead of recognizing Content-Encoding as a reserved keyword.

The default metadata behavior will cause error: enter image description here

Force system-defined tags (rather than the default User-defined tags): enter image description here

Wasted hours and 4 peoples' time on this. Feedback to AWS docs team has been submitted.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.