I want to read a csv file using spark. The file's path has blank spaces. Spark is replacing the blank spaces with %20.
This is the code:
val tmpDF = spark.read.format("com.databricks.spark.csv").option("multiLine", value = true).option("quote", "\"").option("escape", "\"").option("header", "true").option("inferSchema", "true").option("delimiter", delimiter).load(filename)
tmpDF.show(10)
So when the tmpDF.show(10) method is executed the following error is thrown:
java.io.FileNotFoundException: No such file or directory: s3://{bucket_name}/all/Proposal%20and%20pre-approval/filen_name_20190826-215950.csv
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running REFRESH TABLE tableName command in SQL or by recreating the Dataset/DataFrame involved."
I checked in s3 and the file does exist but the path has a regular space instead of %20.
Any idea how to handle this? I can't change the paths because they are produced by a component that I can't modify.
s3nschema