0

In spark SQl, you can write

SELECT title, rn, 
       lead(rn, 1) IGNORE NULLS over(order by rn) as next_rn
FROM   my_table
;

How would you add the IGNORE NULLS part in the equivalent Scala code?

val my_df = my_table
  .withColumn("next_rn", lead($"rn", 1).over(Window.orderBy("rn"))

1 Answer 1

2

A colleague of mine pointed me to Scala documentation over here. The working Scala code then becomes

.withColumn("next_tbl_rn", lead($"tbl_rn", 1, null, true).over(Window.orderBy($"rn")))

in which the fourth argument does the trick.

Sign up to request clarification or add additional context in comments.

3 Comments

I did not realize that but note the bad performance that can result.
Correct. The df only has a few hundred rows, so no problem with that.
Since the structure of the DF is so, that a non-null value of $"tbl_rn" will be found within 20 rows worst case, a constraint on the window-frame like "between current_row and 20 following" would prevent performance issues in case of large DFs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.