Read csv as Data Frame in spark 1.6

Question

I have Spark 1.6 and trying to read a csv (or tsv) file as a dataframe. Here are the steps I take:

scala>  val sqlContext= new org.apache.spark.sql.SQLContext(sc)
scala> import sqlContext.implicits._
scala> val df = sqlContext.read
scala> .format("com.databricks.spark.csv")
scala> .option("header", "true")
scala.option("inferSchema", "true")
scala> .load("data.csv")
scala> df.show()

Error:

<console>:35: error: value show is not a member of org.apache.spark.sql.DataFrameReader df.show()

The last command is supposed to show the first few lines of the dataframe, but I get the error message. Any help will be much appreciated.

You just copy/pasted the example of spark-csv is the shell without trying to understand how it works. — eliasah
– eliasah, Commented Jul 26, 2016 at 18:30

MrChristine · Accepted Answer · 2016-07-26 17:10:39Z

9

Looks like you functions are not chained together properly and it's attempting to run "show()" on the val df, which is a reference to the DataFrameReader class. If I run the following, I can reproduce your error:

val df = sqlContext.read
df.show()

If you restructure the code, it would work:

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("data.csv")
df.show()

answered Jul 26, 2016 at 17:10

MrChristine

1,56113 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user2145299 Over a year ago

Thanks! I tried it but now I get the error message: "java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv"

MrChristine Over a year ago

If you're trying this locally, you'll need to add the SparkCSV jar to your classpath. You can follow instructions here to start the shell and pull the jars into your environment: github.com/databricks/spark-csv $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.4.0

user2145299 Over a year ago

Thanks you all! It works now, the created dataframe has an additional first row (C0, C1, C2, ...) as the column headings! The actual columns headings are considered the first row of the data. How do I fix this?

whatsinthename Over a year ago

it doesnt work in cloudera. @user2145299 how did you solved it ?

MrChristine Over a year ago

@SiddheshKalgaonkar depending on the Spark release, if older than Spark 2.0 you'll need to download and use the spark-csv pacakges as referenced above. If using a newer version (Spark 2.x+) then it should be included by default.

Ronen Ness · Accepted Answer · 2016-10-18 11:49:38Z

1

In java first add dependency in POM.xml file and run following code to read csv file.

<dependency>
            <groupId>com.databricks</groupId>
            <artifactId>spark-csv_2.10</artifactId>
            <version>1.4.0</version>
        </dependency>

Dataset<Row> df = sparkSession.read().format("com.databricks.spark.csv").option`enter code here`("header", true).option("inferSchema", true).load("hdfs://localhost:9000/usr/local/hadoop_data/loan_100.csv");

edited Oct 18, 2016 at 11:49

Ronen Ness

10.9k4 gold badges36 silver badges51 bronze badges

answered Oct 18, 2016 at 10:00

Rajeev Rathor

1,94225 silver badges23 bronze badges

Comments

Frits · Accepted Answer · 2017-03-10 13:52:32Z

0

Use the following instead:

val sqlContext = new SQLContext(sc);

It should resolve your issue.

edited Mar 10, 2017 at 13:52

Frits

7,64410 gold badges48 silver badges66 bronze badges

answered Mar 10, 2017 at 10:52

user3521180

1,1502 gold badges23 silver badges55 bronze badges

Collectives™ on Stack Overflow

Read csv as Data Frame in spark 1.6

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related