java.io.IOException: regular upload failed: java.lang.NoSuchMethodError

Question

I'm unable to save Pyspark dataframe to S3 bucket.

I'm running the code inside docker dev container

My AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are setup in the environment

Env Setup

Base image: gcr.io/datamechanics/spark:platform-3.2.1-hadoop-3.3.1-java-11-scala-2.12-python-3.8-dm18

I've following jars available in /opt/spark/jars: 'aws-java-sdk-bundle-1.11.901.jar', 'aws-java-sdk-core-1.11.797.jar', 'aws-java-sdk-glue-1.11.797.jar', 'hadoop-aws-3.3.1.jar',

Sample code

`from pyspark.sql import SparkSession
spark = SparkSession.builder \
            .config("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
            .config("spark.dynamicAllocation.enabled", "true") \
            .config("spark.dynamicAllocation.maxExecutors", "4") \
            .config("spark.dynamicAllocation.minExecutors", "1") \
            .config("spark.dynamicAllocation.initialExecutors", "1") \
            .config("spark.sql.parquet.datetimeRebaseModeInRead", "CORRECTED") \
            .config("spark.sql.legacy.pathOptionBehavior.enabled", "true") \
            .config("spark.sql.parquet.datetimeRebaseModeInWrite", "CORRECTED") \
            .getOrCreate()

source_file = "/workspaces/sample/test/*"
df = spark.read.parquet(source_file)
df.write.format("parquet").mode("append").save("s3a://MY_BUCKET/MY_FOLDER/")`

ERROR: java.io.IOException: regular upload failed: java.lang.NoSuchMethodError: 'void com.amazonaws.util.IOUtils.release(java.io.Closeable, com.amazonaws.thirdparty.apache.logging.Log)'

I checked multiple blogs, the error is mainly because of version mismatch is what developers are recommending. The versions looks fine to me because the same setup is working for me when I'm running the same code with same setup in AWS env but when I'm trying to run the same setup from local I'm getting above mentioned error.

stevel · Accepted Answer · 2023-09-07 15:42:40Z

0

you should only have the aws-sdk-bundle jar on the classpath; the other two aws-sdk are from different releases and will only "give you stack traces" as the hadoop s3a docs cover in some detail. the bundle.jar file contains these libraries and shaded versions of all their dependencies.

answered Sep 7, 2023 at 15:42

stevel

13.6k1 gold badge41 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

java.io.IOException: regular upload failed: java.lang.NoSuchMethodError

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related