1

I have a Dataframe in Spark and I would like to replace the values of different columns based on a simple regular expression which is if the value ends with "_P" replace it with "1" and if it ends with "_N" then replace it with "-1". There are multiple columns that I need to do the same replacement. I also need to do a casting at the end.

2
  • 2
    What did you try and why didn't it work? Commented Sep 13, 2016 at 15:50
  • I tried df.na.replace(columns, Map(""[a-zA-Z0-9]_P" -> "1", "[a-zA-Z0-9]_N" -> "-1")). It is not working though Commented Sep 13, 2016 at 16:01

1 Answer 1

3

You can do it through expressions like "when('column.endsWith("_P"), lit("1")).when...". The same could be achieved by using regexp_replace. Here's an example using the when:

val myDf = sc.parallelize(Array(
    ("foo_P", "bar_N", "123"),
    ("foo_N", "bar_Y", "123"),
    ("foo", "bar", "123"),
    ("foo_Y", "bar_XX", "123")
)).toDF("col1", "col2", "col3")

val colsToReplace = Seq("col1", "col2")

import org.apache.spark.sql.Column

val castValues = (colName: String) => {
    val col = new Column(colName)

    when(col.endsWith("_P"), lit("1"))
    .when(col.endsWith("_F"), lit("-1"))
    .otherwise(col)
    .as(colName)
}

val selectExprs = myDf.columns.diff(colsToReplace).map(new Column(_)) ++ colsToReplace.map(castValues)

myDf.select(selectExprs:_*).show
/*
+----+-----+------+
|col3| col1|  col2|
+----+-----+------+
| 123|    1| bar_N|
| 123|foo_N| bar_Y|
| 123|  foo|   bar|
| 123|foo_Y|bar_XX|
+----+-----+------+
*/

EDIT

By the way, regarding your comment on what you tried: The "df.na" functions is meant to work on rows containing NULL values, so, even if what you tried worked, it would work only on rows containing nulls. Apart from that, the "replace" doesn't work with regular expressions, at least it didn't the last time I checked.

Cheers

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.