0

I am having some difficulty mapping a function to rows of a dataframe and then convert this back to a new dataframe.

So far I have

  val intrdd = df.rdd.map(row => processRow(row))

  val processeddf = intrdd.toDF

However this does not work as toDF does not work for my RDD[Row] case.

Is there a good way to do this?

Note I am on Spark 2.2.0 so I cannot use SqlContext, only SparkSession.

Thanks.

6
  • stackoverflow.com/questions/37011267/… Commented Jan 19, 2018 at 18:44
  • Is there a way to do it this way with Spark 2.0.0+ and spark session? 2) You can use createDataFrame(rowRDD: RDD[Row], schema: StructType), which is available in the SQLContext object. Example: val df = oldDF.sqlContext.createDataFrame(rdd, oldDF.schema) Note that there is no need to explicitly set any schema column. We reuse the old DF's schema, which is of StructType class and can be easily extended. However, this approach sometimes is not possible, and in some cases can be less efficient than the first one. Commented Jan 19, 2018 at 18:47
  • you can still use sqlContext in spark 2: sparkSession.sqlContext Commented Jan 19, 2018 at 18:48
  • val ss = SparkSession .builder() .appName("ImputationApp") .enableHiveSupport() .getOrCreate() val sqlContext = new org.apache.spark.sql.SQLContext(ss) Doesn't seem to work Commented Jan 19, 2018 at 18:50
  • no, ss.sqlContext will give you the sqlContext Commented Jan 19, 2018 at 18:51

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.