0

I have a pyspark function but would need to convert that to Scala

PySpark

for i in [c for c in r.columns if c.startswith("_")]:
            r = r.withColumn(i, F.col(i)["id"])

As scala data type is unmutable hence any better way from Scala for me to create multiple new columns ,without val df1 = df.withcolumn, val df2 = df1.withcolumn like what I did in pyspark ?

Table r as below

+-----------+-------------+-------------+-------------+-------------+
|         _0|           _1|           _2|           _3|           _4|
+-----------+-------------+-------------+-------------+-------------+
|[1, Carter]|   [5, Banks]|[11, Derrick]|    [4, Hood]|    [12, Jef]|
|[1, Carter]|    [12, Jef]|    [4, Hood]|   [5, Banks]|[11, Derrick]|
|[1, Carter]|    [4, Hood]|    [12, Jef]|[11, Derrick]|   [5, Banks]|
|[1, Carter]|    [12, Jef]|   [5, Banks]|[11, Derrick]|    [4, Hood]|
|[1, Carter]|    [4, Hood]|    [12, Jef]|   [5, Banks]|[11, Derrick]|
|[1, Carter]|[11, Derrick]|    [12, Jef]|    [4, Hood]|   [5, Banks]|
|[1, Carter]|    [12, Jef]|[11, Derrick]|   [5, Banks]|    [4, Hood]|
|[1, Carter]|   [5, Banks]|    [4, Hood]|[11, Derrick]|    [12, Jef]|
|[1, Carter]|[11, Derrick]|   [5, Banks]|    [4, Hood]|    [12, Jef]|
|[1, Carter]|   [5, Banks]|[11, Derrick]|    [12, Jef]|    [4, Hood]|
|[1, Carter]|   [5, Banks]|    [12, Jef]|[11, Derrick]|    [4, Hood]|
|[1, Carter]|   [5, Banks]|    [12, Jef]|    [4, Hood]|[11, Derrick]|
|[1, Carter]|[11, Derrick]|   [5, Banks]|    [12, Jef]|    [4, Hood]|
|[1, Carter]|    [4, Hood]|[11, Derrick]|   [5, Banks]|    [12, Jef]|
|[1, Carter]|[11, Derrick]|    [4, Hood]|   [5, Banks]|    [12, Jef]|
|[1, Carter]|    [12, Jef]|   [5, Banks]|    [4, Hood]|[11, Derrick]|
|[1, Carter]|    [12, Jef]|[11, Derrick]|    [4, Hood]|   [5, Banks]|
|[1, Carter]|    [4, Hood]|[11, Derrick]|    [12, Jef]|   [5, Banks]|
|[1, Carter]|[11, Derrick]|    [4, Hood]|    [12, Jef]|   [5, Banks]|
|[1, Carter]|    [12, Jef]|    [4, Hood]|[11, Derrick]|   [5, Banks]|
+-----------+-------------+-------------+-------------+-------------+

2 Answers 2

1

You can use foldLeft


import org.apache.spark.sql.functions.{col}

val updDf = df
      .columns
      .filter(_.startsWith("_"))
      .foldLeft(df)((df, c) => df.withColumn(s"new_$c", col(c).getItem("id")))

Sign up to request clarification or add additional context in comments.

Comments

0

You can do it with a single select (each .withColumn creates a new Dataset to resolve)

// either replace with the internal id column, or take as is
val updates = r.columns.map(c => if (c.startsWith("_")) col(s"$c.id") as c else col(c))

val newDf = r.select(updates:_*)  // _* expands the Sequence into a parameter list

1 Comment

This is better as far as I know.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.