Hi I want to add new column using existing column in each row of DataFrame , I am trying this in Spark Scala like this ... df is dataframe containing variable number of column , which can be decided at run time only.
// Added new column "docid"
val df_new = appContext.sparkSession.sqlContext.createDataFrame(df.rdd, df.schema.add("docid", DataTypes.StringType))
df_new.map(x => {
import appContext.sparkSession.implicits._
val allVals = (0 to x.size).map(x.get(_)).toSeq
val values = allVals ++ allVals.mkString("_")
Row.fromSeq(values)
})
But this is giving error is eclipse itself
- Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
- not enough arguments for method map: (implicit evidence$7: org.apache.spark.sql.Encoder[org.apache.spark.sql.Row])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]. Unspecified value parameter evidence$7.
Please help.
importshould be done outside of themap.df.withColumn()as suggested in werner's answer? That would be the most straightforward solution