1

coming from the R world I want to import an .csv into Spark (v.1.6.1) using the Scala Shell (./spark-shell)

My .csv has a header and looks like

"col1","col2","col3"
1.4,"abc",91
1.3,"def",105
1.35,"gh1",104

Thanks.

1 Answer 1

7

Spark 2.0+

Since the databricks/spark-csv has been integrated into Spark, reading .CSVs is pretty straight forward using the SparkSession

val spark = .builder()
   .master("local")
   .appName("Word Count")
   .getOrCreate()
val df = spark.read.option("header", true).csv(path)

Older versions

After restarting my spark-shell I figured it out by myself - may be of help for others:

After installing like described here and starting the spark-shell using ./spark-shell --packages com.databricks:spark-csv_2.11:1.4.0:

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> val df = sqlContext.read.format("com.databricks.spark.csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .load("/home/vb/opt/spark/data/mllib/mydata.csv")
scala> df.printSchema()
root
 |-- col1: double (nullable = true)
 |-- col2: string (nullable = true)
 |-- col3: integer (nullable = true)
Sign up to request clarification or add additional context in comments.

2 Comments

what is spark here ? is it a spark context ?
Nope, starting Spark 2.0 spark refers the new SparkSession, see spark.apache.org/docs/2.1.0/api/scala/… - I added that to the answer. Thanks !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.