0

I have a following dataframe

+--------------------+
|        values      |
+--------------------+
|[[1,1,1],[3,2,4],[1,|
|[[1,1,2],[2,2,4],[1,|
|[[1,1,3],[4,2,4],[1,|

I want a column with the tail of the list. So far I know how to select the first element val df1 = df.select("values").getItem(0) , but is there a method which would allow me drop the first element ?

2 Answers 2

1

A UDF with a simple size check seems to be the simplest solution:

val df = Seq((1, Seq(1, 2, 3)), (2, Seq(4, 5))).toDF("c1", "c2")

def tail = udf( (s: Seq[Int]) => if (s.size > 1) s.tail else Seq.empty[Int] )

df.select($"c1", tail($"c2").as("c2tail")).show
// +---+------+
// | c1|c2tail|
// +---+------+
// |  1|[2, 3]|
// |  2|   [5]|
// +---+------+

As per suggestion in the comment section, a preferred solution would be to use Spark built-in function slice:

df.select($"c1", slice($"c2", 2, Int.MaxValue).as("c2tail"))
Sign up to request clarification or add additional context in comments.

5 Comments

What if it has just one entry for c2?
Good point @thebluephantom. If c2 consists of 0 or 1 element, posexplode (or explode) will generate 0 rows for the tail as expected, thus discarding the corresponding c1. A UDF with size check like if (s.size > 1) s.tail else Seq.empty[Int] appears to be the simplest route. It's unfortunate none of the array_??? methods seems to provide a solution for something as simple as this.
the splice I think suffices
Ah, nice! I suppose you meant slice? That would work.
sorry yes, also splice is a slice!
0

I don't think exists a built-in operator for this. But you can use UDFs, for example:

import collection.mutable.WrappedArray
def tailUdf = udf((array: WrappedArray[WrappedArray[Int]])=> array.tail)
df.select(tailUdf(col("value"))).show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.