1

I have a nested array which looks like

a = [[1,2],[2,3]]

i have a streaming dataframe which looks like

|system    |level|

+----------+-----+

|Test1     |1    |

|Test2     |3    |

I want to include the array into third column as a nested array.

|system    |level| Data |

+----------+-----+------+

|Test1     |1    |[[1,2],[2,3]]

I tried with column and array function. But i am not sure how to use nested array.

Any help would be appreciated.

3 Answers 3

1

You can add a new column, but you'll have to use a crossJoin:

a = [[1,2],[2,3]]

df.crossJoin(spark.createDataFrame([a], "array<array<bigint>>")).show()

+-------------------+----+------+----------------+
|               date|hour| value|            data|
+-------------------+----+------+----------------+
|1984-01-01 00:00:00|   1|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   2|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   3|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   4|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   5|638.55|[[1, 2], [2, 3]]|
+-------------------+----+------+----------------+
Sign up to request clarification or add additional context in comments.

Comments

1

In scala API, we can use "typedLit" function to add the Array or map values in the column.

// Ref : https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$

Here is the sample code to add an Array as a column value.

import org.apache.spark.sql.functions.typedLit

val a = Seq((1,2),(2,3))
val df1 = Seq(("Test1", 1), ("Test3", 3)).toDF("a", "b")

df1.withColumn("new_col", typedLit(a)).show()

// Output

+-----+---+----------------+
|    a|  b|         new_col|
+-----+---+----------------+
|Test1|  1|[[1, 2], [2, 3]]|
|Test3|  3|[[1, 2], [2, 3]]|
+-----+---+----------------+

I hope this helps.

Comments

0

If you want to add the same array to all raws then you can use the TypedLit from the sql functions. See this answer:
https://stackoverflow.com/a/32788650/12365294

2 Comments

i did tried this. But i am unable to import "import org.apache.spark.sql.functions" in the python. i included the jar file org.apache.spark:spark-sql_2.11:2.4.4 in my execution. But still no luck.
for pyspark you need to import "from pyspark.sql.functions import *"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.