Creating a column of array using another column of array in a Spark Dataframe (Scala)

Question

I am new to both Scala and Spark. I am trying to transform an input read from files as Double into Float (which is safe in this application) so as to reduce the memory usage. I have been able to do that with a column of Double.

Current approach for a single element:

import org.apache.spark.sql.functions.{col, udf}
val tcast = udf((s: Double) => s.toFloat)

val myDF = Seq(
   (1.0, Array(0.1, 2.1, 1.2)),
   (8.0, Array(1.1, 2.1, 3.2)),
   (9.0, Array(1.1, 1.1, 2.2))
).toDF("time", "crds")

myDF.withColumn("timeF", tcast(col("time"))).drop("time").withColumnRenamed("timeF", "time").show
myDF.withColumn("timeF", tcast(col("time"))).drop("time").withColumnRenamed("timeF", "time").schema

But currently stuck with transforming array of doubles to floats. Any help would be appreciated.

Cesar A. Mostacero · Accepted Answer · 2020-01-28 17:44:45Z

1

You can use selectExpr, like:

val myDF = Seq(
   (1.0, Array(0.1, 2.1, 1.2)),
   (8.0, Array(1.1, 2.1, 3.2)),
   (9.0, Array(1.1, 1.1, 2.2))
).toDF("time", "crds")

myDF.printSchema()

// output:
root
 |-- time: double (nullable = false)
 |-- crds: array (nullable = true)
 |    |-- element: double (containsNull = false)

val df = myDF.selectExpr("cast(time as float) time", "cast(crds as array<float>) as crds")
df.show()

+----+---------------+
|time|           crds|
+----+---------------+
| 1.0|[0.1, 2.1, 1.2]|
| 8.0|[1.1, 2.1, 3.2]|
| 9.0|[1.1, 1.1, 2.2]|
+----+---------------+

df.printSchema()

root
 |-- time: float (nullable = false)
 |-- crds: array (nullable = true)
 |    |-- element: float (containsNull = true)

edited Jan 28, 2020 at 17:44

answered Jan 28, 2020 at 17:39

Cesar A. Mostacero

7686 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Quiescent Over a year ago

Thanks for the quick reply but crds are still double!?

Quiescent Over a year ago

Thank you very much for the solution!

Collectives™ on Stack Overflow

Creating a column of array using another column of array in a Spark Dataframe (Scala)

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related