0

Afternoon All,

I am attempting to call some Spark SQL on a SchemaRDD, and then the result stored in an RDD. The below line is producing the expected values, so I know the SQL is generating the correct table. Now I just need to store it.

sqlContext.sql("select encounter.Member_ID AS patientID, encounter.Encounter_DateTime AS date, diag.code from encounter join diag on encounter.Encounter_ID = diag.Encounter_ID").show(1)

1 Answer 1

1

sqlContext.sql gives the DataFrame, you can call .rdd() to get the RDD[Row] .

You can try this:

 val queryResult = sqlContext.sql("select encounter.Member_ID AS patientID, encounter.Encounter_DateTime AS date, diag.code from encounter join diag on encounter.Encounter_ID = diag.Encounter_ID")

 val rdd: RDD[Row] = queryResult.rdd

Remove the show function on DataFrame since it displays the content of the DataFrame to stdout

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the response! Dropped the .show(1) at the end as well.
remove the .show() at the end. show method shows you some random rows from the dataframe
Got it to work for RDD[Row]. Now I am trying to match this table with the following class of RDD - case class Diagnostic(patientID:String, date: Date, code: String). Attempting to build the RDD with val res1: RDD[Diagnostic] = sqlContext.sql("select encounter.Member_ID AS patientID, encounter.Encounter_DateTime AS date, diag.code from encounter join diag on encounter.Encounter_ID = diag.Encounter_ID").rdd
i m nt sure if it will match since queryResult.rdd returns you the RDD[Row]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.