Passing Array to Spark Lit function

Question

Let's say I have a numpy array a that contains the numbers 1-10:
[1 2 3 4 5 6 7 8 9 10]

I also have a Spark dataframe to which I want to add my numpy array a. I figure that a column of literals will do the job. This doesn't work:

df = df.withColumn("NewColumn", F.lit(a))

Unsupported literal type class java.util.ArrayList

But this works:

df = df.withColumn("NewColumn", F.lit(a[0]))

How to do it?

Example DF before:

col1
a b c d e f g h i j

Expected result:

col1	NewColumn
a b c d e f g h i j	1 2 3 4 5 6 7 8 9 10

ZygD · Accepted Answer · 2021-08-06 07:40:53Z

56

List comprehension inside Spark's `array`

a = [1,2,3,4,5,6,7,8,9,10]
df = spark.createDataFrame([['a b c d e f g h i j '],], ['col1'])
df = df.withColumn("NewColumn", F.array([F.lit(x) for x in a]))

df.show(truncate=False)
df.printSchema()
#  +--------------------+-------------------------------+
#  |col1                |NewColumn                      |
#  +--------------------+-------------------------------+
#  |a b c d e f g h i j |[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]|
#  +--------------------+-------------------------------+
#  root
#   |-- col1: string (nullable = true)
#   |-- NewColumn: array (nullable = false)
#   |    |-- element: integer (containsNull = false)

@pault commented (Python 2.7):

You can hide the loop using map:
df.withColumn("NewColumn", F.array(map(F.lit, a)))

@ abegehr added Python 3 version:

df.withColumn("NewColumn", F.array(*map(F.lit, a)))

Spark's `udf`

# Defining UDF
def arrayUdf():
    return a
callArrayUdf = F.udf(arrayUdf, T.ArrayType(T.IntegerType()))

# Calling UDF
df = df.withColumn("NewColumn", callArrayUdf())

Output is the same.

edited Aug 6, 2021 at 7:40

ZygD

24.8k41 gold badges106 silver badges144 bronze badges

answered Apr 6, 2018 at 3:30

Anahcolus

42.1k6 gold badges75 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

A. R. Over a year ago

I tried this and it works. Thank you for the answer and I will keep it this way for now. However, in reality, my "a" array has tens of thousands of entries, and because of the for loop, it is not quite efficient. Is there a way to do it without loops?

Anahcolus Over a year ago

@A.R. I have updated my answer using udf function which doesn't require for loop. If the answer is helpful you can accept it and upvote

pault Over a year ago

You can hide the loop using map: df.withColumn("NewColumn", F.array(map(F.lit, a)))

Ani Menon Over a year ago

@pault Isn't map an rdd function? Also output of map is neither a string or column so withColumn would throw an error.

abegehr Over a year ago

@pault, I think this should be F.array(*map(F.lit, a)) with the (star) spread operator, since F.array cannot handle a map object.

|

Neeraj Bhadani · Accepted Answer · 2020-05-31 10:57:51Z

2

In scala API, we can use "typedLit" function to add the Array or map values in the column.

// Ref : https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$

Here is the sample code to add an Array or Map as a column value.

import org.apache.spark.sql.functions.typedLit

val df1 = Seq((1, 0), (2, 3)).toDF("a", "b")

df1.withColumn("seq", typedLit(Seq(1,2,3)))
    .withColumn("map", typedLit(Map(1 -> 2)))
    .show(truncate=false)

// Output

+---+---+---------+--------+
|a  |b  |seq      |map     |
+---+---+---------+--------+
|1  |0  |[1, 2, 3]|[1 -> 2]|
|2  |3  |[1, 2, 3]|[1 -> 2]|
+---+---+---------+--------+

I hope this helps.

edited May 31, 2020 at 10:57

answered May 31, 2020 at 9:55

Neeraj Bhadani

3,14021 silver badges28 bronze badges

1 Comment

Ani Menon Over a year ago

This doesn't answer the question, the OP has asked for a pyspark solution.

Collectives™ on Stack Overflow

Passing Array to Spark Lit function

2 Answers 2

List comprehension inside Spark's `array`

Spark's `udf`

7 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

List comprehension inside Spark's array

Spark's udf

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related

List comprehension inside Spark's `array`

Spark's `udf`