2

I am trying to define a new column in a Spark DataFrame using a constant defined as a var. I'm using Zeppelin - in the initial cell, it starts with

%spark
import org.apache.spark.sql.functions._
var year : Int = 2016
spark.read.parquet("<path/to/file>")

The file contains a column named birth_year; I want to create a new column named age defined as $year - birth_year, where birth_year is a string column. I'm not quite clear on how to do this when the input argument to a UDF is a parameter. I've done a couple hours of searching and created a UDF, but I got an error message whose principal part is

<console>:71: error: type mismatch;
 found   : Int
 required: org.apache.spark.sql.Column
       spark.read.parquet("path/to/file").withColumn("birth_year", $"birth_year" cast "Int").withColumn("age", createAge(year, col("birth_year"))).createOrReplaceTempView("tmp")

and a caret directly under 'year'.

I suspect that $year does not map into a variable of the same length as birth_year; I've seen the lit() function that appears to work for strings - does it work with integer values as well, or is there another function for this purpose?

I tried the following:

%spark
import org.apache.spark.sql.functions._
var year : Int = 2016
def createAge = udf((yr : Int, dob : Int) => {yr - dob})
spark.read.parquet("<path/to/file>").withColumn("birth_year", $"birth_year" cast "Int").withColumn("age", createAge($"year", col("birth_year"))).createOrReplaceTempView("tmp")

Any suggestions welcome - thanks in advance for any help.

1

1 Answer 1

4

You can't use year directly as an input to the UDF since a it expects columns to operate on. To create a column with a constant value use lit(). You can call the UDF as follows:

df.withColumn("age", createAge(lit(year), $"birth_year".cast("int")))

However, when possible it's always preferred to use the in-built functions in Spark when possible. In this case, you do not need an UDF. Simply do:

df.withColumn("age", lit(year) - $"birth_year".cast("int"))

This should be much faster.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.