1

I have textRDD: org.apache.spark.rdd.RDD[(String, String)]

I would like to convert it to a DataFrame. The columns correspond to the title and content of each page(row).

2 Answers 2

1

Use toDF(), provide the column names if you have them.

val textDF = textRDD.toDF("title": String, "content": String)
textDF: org.apache.spark.sql.DataFrame = [title: string, content: string]

or

val textDF = textRDD.toDF()
textDF: org.apache.spark.sql.DataFrame = [_1: string, _2: string]

The shell auto-imports (I am using version 1.5), but you may need import sqlContext.implicits._ in an application.

Sign up to request clarification or add additional context in comments.

3 Comments

Scala version of toDF supports only column names, not schema. If you want to provide schema you have to use SQLContext. createDataFrame.
Thank you, I changed it to columns. Schema means something different.
I would actually drop type annotations as well: toDF("title", "content"). There is really nothing to gain but for someone not familiar with Scala it may suggest that it is actually connected to the column types.
0

I usually do this like the following:

Create a case class like this:

case class DataFrameRecord(property1: String, property2: String)

Then you can use map to convert into the new structure using the case class:

rdd.map(p => DataFrameRecord(prop1, prop2)).toDF()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.