4

I want to read a csv file using spark. The file's path has blank spaces. Spark is replacing the blank spaces with %20.

This is the code:

val tmpDF = spark.read.format("com.databricks.spark.csv").option("multiLine", value = true).option("quote", "\"").option("escape", "\"").option("header", "true").option("inferSchema", "true").option("delimiter", delimiter).load(filename)

tmpDF.show(10)

So when the tmpDF.show(10) method is executed the following error is thrown:

java.io.FileNotFoundException: No such file or directory: s3://{bucket_name}/all/Proposal%20and%20pre-approval/filen_name_20190826-215950.csv 

It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running REFRESH TABLE tableName command in SQL or by recreating the Dataset/DataFrame involved."

I checked in s3 and the file does exist but the path has a regular space instead of %20.

Any idea how to handle this? I can't change the paths because they are produced by a component that I can't modify.

4
  • Try Using s3n schema Commented Aug 30, 2019 at 0:59
  • @SMaZ I got the following exception:No FileSystem for scheme: s3n. Exit Code is non-zero or 1, hence not updating the last modified date Commented Aug 30, 2019 at 1:46
  • Can you add detail log. Also, try accessing a file without any space. Seems like another issue. Commented Aug 30, 2019 at 2:18
  • instead of using .option("escape", "\"") with try this to read file .option("escape"," ") hope fully its solve your purpose. let me know if you face same issue Commented Aug 30, 2019 at 6:18

1 Answer 1

3

This is the typical problem of url encoding. The URL coming from S3 is encoded with %20. However, spark incorrectly decodes that.

There had been two issues regarding this

  1. https://jira.apache.org/jira/browse/SPARK-23148
  2. https://jira.apache.org/jira/browse/SPARK-24320

The issues have been resolved in spark2.3 version. If you are using older version

You need to escape the file names after decode the url.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.