Import data using Spark Scala

Question

I have a large Data set which i want to import into databricks to do some analytics using scala. The data set is available in this link : https://drive.google.com/open?id=1g4YYALk3nArN8bX2uFS70IpbdSf_Efqj

I want to import this data set such that , the document ID is in the first column and the other test data in the other column.

But when i import the data using following code , it looks like this

val df = spark.read.text("FileStore/tables/plot_summaries.txt")

df.select("value").show()

Can anyone help me to import this in the proper way ? Any help would be highly appreciated. Thank you

Does this answer your question? Reading TSV into Spark Dataframe with Scala API — Shaido
– Shaido, Commented Mar 4, 2020 at 8:55

Vijay · Accepted Answer · 2020-03-04 06:34:42Z

4

This will solve your issue.

spark.read.option("sep", "\t").text("FileStore/tables/plot_summaries.txt")

answered Mar 4, 2020 at 6:34

Vijay

1336 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

NIKHIL SUTHAR · Accepted Answer · 2020-03-04 06:48:09Z

3

You have data with tab, so you need to provide a delimiter externally.

scala> import org.apache.spark.sql.types._
scala> val schema = new StructType().add("DocumentID", LongType, true).add("Description", StringType, true)

scala> val df = spark.read.format("csv").option("delimiter", "\t").schema(schema).load("/plot_summaries.txt")

scala> df.show(10)
+----------+--------------------+
|DocumentID|         Description|
+----------+--------------------+
|  23890098|Shlykov, a hard-w...|
|  31186339|The nation of Pan...|
|  20663735|Poovalli Induchoo...|
|   2231378|The Lemon Drop Ki...|
|    595909|Seventh-day Adven...|
|   5272176|The president is ...|
|   1952976|{{plot}} The film...|
|  24225279|The story begins ...|
|   2462689|Infuriated at bei...|
|  20532852|A line of people ...|
+----------+--------------------+

answered Mar 4, 2020 at 6:48

NIKHIL SUTHAR

2,4511 gold badge11 silver badges33 bronze badges

1 Comment

BdEngineer Over a year ago

can you help and suggest how to handle this stackoverflow.com/questions/62036791/…

Collectives™ on Stack Overflow

Import data using Spark Scala

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related