45

Can someone please share how one can convert a dataframe to an RDD?

0

3 Answers 3

71

Simply:

val rows: RDD[Row] = df.rdd
Sign up to request clarification or add additional context in comments.

3 Comments

if you get "type not found" for either RDD or Row this might help: val rows: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = df.rdd
To extend Boern's answer, add the following two import commands: import org.apache.spark.rdd.RDD import org.apache.spark.sql.Row
Would this change anything in Spark memory holding the data, or only more lightly create a new object pointing at the same data? I hope it's the lighter of the two but not sure from the source code comments.
4

I was just looking for my answer and found this post.

Jean's answer to absolutely correct,adding on that "df.rdd" will return a RDD[Rows]. I need to apply split() once i get RDD. For that we need to convert RDD[Row} to RDD[String]

val opt=spark.sql("select tags from cvs").map(x=>x.toString()).rdd

Comments

3

Use df.map(row => ...) to convert the dataframe to a RDD if you want to map a row to a different RDD element. For example

df.map(row => (row(1), row(2)))

gives you a paired RDD where the first column of the df is the key and the second column of the df is the value.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.