If I understand correctly, ArrayType can be added as Spark DataFrame columns. I am trying to add a multidimensional array to an existing Spark DataFrame by using the withColumn method. My idea is to have this array available with each DataFrame row in order to use it to send back information from the map function.
The error I get says that the withColumn function is looking for a Column type but it is getting an array. Are there any other functions that will allow adding an ArrayType?
object TestDataFrameWithMultiDimArray {
val nrRows = 1400
val nrCols = 500
/** Our main function where the action happens */
def main(args: Array[String]) {
// Create a SparkContext using every core of the local machine, named RatingsCounter
val sc = new SparkContext("local[*]", "TestDataFrameWithMultiDimArray")
val sqlContext = new SQLContext(sc)
val PropertiesDF = sqlContext.read
.format("com.crealytics.spark.excel")
.option("location", "C:/Users/tjoha/Desktop/Properties.xlsx")
.option("useHeader", "true")
.option("treatEmptyValuesAsNulls", "true")
.option("inferSchema", "true")
.option("addColorColumns", "False")
.option("sheetName", "Sheet1")
.load()
PropertiesDF.show()
PropertiesDF.printSchema()
val PropertiesDFPlusMultiDimArray = PropertiesDF.withColumn("ArrayCol", Array.ofDim[Any](nrRows,nrCols))
}
Thanks for your help.
Kind regards,
Johann