Can someone please share how one can convert a dataframe to an RDD?
3 Answers
Simply:
val rows: RDD[Row] = df.rdd
3 Comments
Boern
if you get "type not found" for either RDD or Row this might help:
val rows: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = df.rddRavi
To extend Boern's answer, add the following two import commands: import org.apache.spark.rdd.RDD import org.apache.spark.sql.Row
matanox
Would this change anything in Spark memory holding the data, or only more lightly create a new object pointing at the same data? I hope it's the lighter of the two but not sure from the source code comments.
I was just looking for my answer and found this post.
Jean's answer to absolutely correct,adding on that "df.rdd" will return a RDD[Rows]. I need to apply split() once i get RDD. For that we need to convert RDD[Row} to RDD[String]
val opt=spark.sql("select tags from cvs").map(x=>x.toString()).rdd