Spark & Scala: Read in CSV file as DataFrame / Dataset

Question

coming from the R world I want to import an .csv into Spark (v.1.6.1) using the Scala Shell (./spark-shell)

My .csv has a header and looks like

"col1","col2","col3"
1.4,"abc",91
1.3,"def",105
1.35,"gh1",104

Thanks.

Community · Accepted Answer · 2017-05-23 12:25:52Z

7

Spark 2.0+

Since the databricks/spark-csv has been integrated into Spark, reading .CSVs is pretty straight forward using the SparkSession

val spark = .builder()
   .master("local")
   .appName("Word Count")
   .getOrCreate()
val df = spark.read.option("header", true).csv(path)

Older versions

After restarting my spark-shell I figured it out by myself - may be of help for others:

After installing like described here and starting the spark-shell using ./spark-shell --packages com.databricks:spark-csv_2.11:1.4.0:

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
scala> val df = sqlContext.read.format("com.databricks.spark.csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .load("/home/vb/opt/spark/data/mllib/mydata.csv")
scala> df.printSchema()
root
 |-- col1: double (nullable = true)
 |-- col2: string (nullable = true)
 |-- col3: integer (nullable = true)

edited May 23, 2017 at 12:25

CommunityBot

11 silver badge

answered May 17, 2016 at 8:55

Boern

7,8225 gold badges62 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Romain Jouin Over a year ago

what is spark here ? is it a spark context ?

Boern Over a year ago

Nope, starting Spark 2.0 spark refers the new SparkSession, see spark.apache.org/docs/2.1.0/api/scala/… - I added that to the answer. Thanks !

Collectives™ on Stack Overflow

Spark & Scala: Read in CSV file as DataFrame / Dataset

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related