Loading nested array into spark dataframe column

Question

I have a nested array which looks like

a = [[1,2],[2,3]]

i have a streaming dataframe which looks like

|system    |level|

+----------+-----+

|Test1     |1    |

|Test2     |3    |

I want to include the array into third column as a nested array.

|system    |level| Data |

+----------+-----+------+

|Test1     |1    |[[1,2],[2,3]]

I tried with column and array function. But i am not sure how to use nested array.

Any help would be appreciated.

pissall · Accepted Answer · 2019-11-17 23:14:41Z

1

You can add a new column, but you'll have to use a crossJoin:

a = [[1,2],[2,3]]

df.crossJoin(spark.createDataFrame([a], "array<array<bigint>>")).show()

+-------------------+----+------+----------------+
|               date|hour| value|            data|
+-------------------+----+------+----------------+
|1984-01-01 00:00:00|   1|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   2|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   3|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   4|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00|   5|638.55|[[1, 2], [2, 3]]|
+-------------------+----+------+----------------+

answered Nov 17, 2019 at 23:14

pissall

7,4442 gold badges29 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Neeraj Bhadani · Accepted Answer · 2020-05-31 10:59:55Z

1

In scala API, we can use "typedLit" function to add the Array or map values in the column.

// Ref : https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$

Here is the sample code to add an Array as a column value.

import org.apache.spark.sql.functions.typedLit

val a = Seq((1,2),(2,3))
val df1 = Seq(("Test1", 1), ("Test3", 3)).toDF("a", "b")

df1.withColumn("new_col", typedLit(a)).show()

// Output

+-----+---+----------------+
|    a|  b|         new_col|
+-----+---+----------------+
|Test1|  1|[[1, 2], [2, 3]]|
|Test3|  3|[[1, 2], [2, 3]]|
+-----+---+----------------+

I hope this helps.

edited May 31, 2020 at 10:59

answered May 31, 2020 at 10:03

Neeraj Bhadani

3,14021 silver badges28 bronze badges

Comments

Anas Alzogbi · Accepted Answer · 2019-11-17 23:30:22Z

0

If you want to add the same array to all raws then you can use the TypedLit from the sql functions. See this answer:
https://stackoverflow.com/a/32788650/12365294

answered Nov 17, 2019 at 23:30

Anas Alzogbi

436 bronze badges

2 Comments

Senthil Over a year ago

i did tried this. But i am unable to import "import org.apache.spark.sql.functions" in the python. i included the jar file org.apache.spark:spark-sql_2.11:2.4.4 in my execution. But still no luck.

Mahesh Gupta Over a year ago

for pyspark you need to import "from pyspark.sql.functions import *"

Collectives™ on Stack Overflow

Loading nested array into spark dataframe column

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related