Spark Error reading csv file with spaces in the path/file name

Question

I want to read a csv file using spark. The file's path has blank spaces. Spark is replacing the blank spaces with %20.

This is the code:

val tmpDF = spark.read.format("com.databricks.spark.csv").option("multiLine", value = true).option("quote", "\"").option("escape", "\"").option("header", "true").option("inferSchema", "true").option("delimiter", delimiter).load(filename)

tmpDF.show(10)

So when the tmpDF.show(10) method is executed the following error is thrown:

java.io.FileNotFoundException: No such file or directory: s3://{bucket_name}/all/Proposal%20and%20pre-approval/filen_name_20190826-215950.csv

It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running REFRESH TABLE tableName command in SQL or by recreating the Dataset/DataFrame involved."

I checked in s3 and the file does exist but the path has a regular space instead of %20.

Any idea how to handle this? I can't change the paths because they are produced by a component that I can't modify.

@SMaZ I got the following exception:No FileSystem for scheme: s3n. Exit Code is non-zero or 1, hence not updating the last modified date — Annie
– Annie, Commented Aug 30, 2019 at 1:46
Can you add detail log. Also, try accessing a file without any space. Seems like another issue. — SMaZ
– SMaZ, Commented Aug 30, 2019 at 2:18
instead of using .option("escape", "\"") with try this to read file .option("escape"," ") hope fully its solve your purpose. let me know if you face same issue — Mahesh Gupta
– Mahesh Gupta, Commented Aug 30, 2019 at 6:18

Avishek Bhattacharya · Accepted Answer · 2019-08-30 09:07:07Z

3

This is the typical problem of url encoding. The URL coming from S3 is encoded with %20. However, spark incorrectly decodes that.

There had been two issues regarding this

The issues have been resolved in spark2.3 version. If you are using older version

You need to escape the file names after decode the url.

answered Aug 30, 2019 at 9:07

Avishek Bhattacharya

7,0243 gold badges38 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Spark Error reading csv file with spaces in the path/file name

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related