3

I'm new to pyspark. I'm running pyspark using databricks. My data is stored in Azure Data Lake Service.I'm trying to read csv file from ADLS to pyspark data frame. So I wrote following code

import pyspark
from pyspark import SparkContext 
from pyspark import SparkFiles

df = sqlContext.read.csv(SparkFiles.get("dbfs:mycsv path in ADSL/Data.csv"), 
   header=True, inferSchema= True)

But I'm getting error message

Py4JJavaError: An error occurred while calling o389.csv.

Can you suggest me to rectify this error?

1 Answer 1

1

The SparkFiles class is intended for accessing the files shipped as part of the Spark job. If you just need access to the CSV file available on ADLS, then you just need to use spark.read.csv, like:

df = spark.read.csv("dbfs:mycsv path in ADSL/Data.csv", 
  header=True, inferSchema=True)

it's better not to use sqlContext, it's kept for compatibility reasons.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.