I have textRDD: org.apache.spark.rdd.RDD[(String, String)]
I would like to convert it to a DataFrame. The columns correspond to the title and content of each page(row).
Use toDF(), provide the column names if you have them.
val textDF = textRDD.toDF("title": String, "content": String)
textDF: org.apache.spark.sql.DataFrame = [title: string, content: string]
or
val textDF = textRDD.toDF()
textDF: org.apache.spark.sql.DataFrame = [_1: string, _2: string]
The shell auto-imports (I am using version 1.5), but you may need import sqlContext.implicits._ in an application.
toDF supports only column names, not schema. If you want to provide schema you have to use SQLContext. createDataFrame.toDF("title", "content"). There is really nothing to gain but for someone not familiar with Scala it may suggest that it is actually connected to the column types.