spark scala ignore nulls in windowing clause

Question

In spark SQl, you can write

SELECT title, rn, 
       lead(rn, 1) IGNORE NULLS over(order by rn) as next_rn
FROM   my_table
;

How would you add the IGNORE NULLS part in the equivalent Scala code?

val my_df = my_table
  .withColumn("next_rn", lead($"rn", 1).over(Window.orderBy("rn"))

M.S.Visser · Accepted Answer · 2025-01-15 11:21:33Z

2

A colleague of mine pointed me to Scala documentation over here. The working Scala code then becomes

.withColumn("next_tbl_rn", lead($"tbl_rn", 1, null, true).over(Window.orderBy($"rn")))

in which the fourth argument does the trick.

answered Jan 15 at 11:21

M.S.Visser

813 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ged Jan 15 at 12:29

I did not realize that but note the bad performance that can result.

M.S.Visser Jan 15 at 12:49

Correct. The df only has a few hundred rows, so no problem with that.

M.S.Visser Jan 15 at 12:57

Since the structure of the DF is so, that a non-null value of $"tbl_rn" will be found within 20 rows worst case, a constraint on the window-frame like "between current_row and 20 following" would prevent performance issues in case of large DFs.

Collectives™ on Stack Overflow

spark scala ignore nulls in windowing clause

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related