1

I have Spark 1.6 and trying to read a csv (or tsv) file as a dataframe. Here are the steps I take:

scala>  val sqlContext= new org.apache.spark.sql.SQLContext(sc)
scala> import sqlContext.implicits._
scala> val df = sqlContext.read
scala> .format("com.databricks.spark.csv")
scala> .option("header", "true")
scala.option("inferSchema", "true")
scala> .load("data.csv")
scala> df.show()

Error:

<console>:35: error: value show is not a member of org.apache.spark.sql.DataFrameReader df.show()

The last command is supposed to show the first few lines of the dataframe, but I get the error message. Any help will be much appreciated.

1
  • You just copy/pasted the example of spark-csv is the shell without trying to understand how it works. Commented Jul 26, 2016 at 18:30

3 Answers 3

9

Looks like you functions are not chained together properly and it's attempting to run "show()" on the val df, which is a reference to the DataFrameReader class. If I run the following, I can reproduce your error:

val df = sqlContext.read
df.show()

If you restructure the code, it would work:

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("data.csv")
df.show()
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! I tried it but now I get the error message: "java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv"
If you're trying this locally, you'll need to add the SparkCSV jar to your classpath. You can follow instructions here to start the shell and pull the jars into your environment: github.com/databricks/spark-csv $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.4.0
Thanks you all! It works now, the created dataframe has an additional first row (C0, C1, C2, ...) as the column headings! The actual columns headings are considered the first row of the data. How do I fix this?
it doesnt work in cloudera. @user2145299 how did you solved it ?
@SiddheshKalgaonkar depending on the Spark release, if older than Spark 2.0 you'll need to download and use the spark-csv pacakges as referenced above. If using a newer version (Spark 2.x+) then it should be included by default.
1

In java first add dependency in POM.xml file and run following code to read csv file.

<dependency>
            <groupId>com.databricks</groupId>
            <artifactId>spark-csv_2.10</artifactId>
            <version>1.4.0</version>
        </dependency>

Dataset<Row> df = sparkSession.read().format("com.databricks.spark.csv").option`enter code here`("header", true).option("inferSchema", true).load("hdfs://localhost:9000/usr/local/hadoop_data/loan_100.csv");

Comments

0

Use the following instead:

val sqlContext = new SQLContext(sc);

It should resolve your issue.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.