how to replace a string in Spark DataFrame using regexp

Question

I have a Dataframe in Spark and I would like to replace the values of different columns based on a simple regular expression which is if the value ends with "_P" replace it with "1" and if it ends with "_N" then replace it with "-1". There are multiple columns that I need to do the same replacement. I also need to do a casting at the end.

I tried df.na.replace(columns, Map(""[a-zA-Z0-9]_P" -> "1", "[a-zA-Z0-9]_N" -> "-1")). It is not working though — HHH
– HHH, Commented Sep 13, 2016 at 16:01

alghimo · Accepted Answer · 2016-09-13 19:37:24Z

You can do it through expressions like "when('column.endsWith("_P"), lit("1")).when...". The same could be achieved by using regexp_replace. Here's an example using the when:

val myDf = sc.parallelize(Array(
    ("foo_P", "bar_N", "123"),
    ("foo_N", "bar_Y", "123"),
    ("foo", "bar", "123"),
    ("foo_Y", "bar_XX", "123")
)).toDF("col1", "col2", "col3")

val colsToReplace = Seq("col1", "col2")

import org.apache.spark.sql.Column

val castValues = (colName: String) => {
    val col = new Column(colName)

    when(col.endsWith("_P"), lit("1"))
    .when(col.endsWith("_F"), lit("-1"))
    .otherwise(col)
    .as(colName)
}

val selectExprs = myDf.columns.diff(colsToReplace).map(new Column(_)) ++ colsToReplace.map(castValues)

myDf.select(selectExprs:_*).show
/*
+----+-----+------+
|col3| col1|  col2|
+----+-----+------+
| 123|    1| bar_N|
| 123|foo_N| bar_Y|
| 123|  foo|   bar|
| 123|foo_Y|bar_XX|
+----+-----+------+
*/

EDIT

By the way, regarding your comment on what you tried: The "df.na" functions is meant to work on rows containing NULL values, so, even if what you tried worked, it would work only on rows containing nulls. Apart from that, the "replace" doesn't work with regular expressions, at least it didn't the last time I checked.

Cheers

Collectives™ on Stack Overflow

how to replace a string in Spark DataFrame using regexp

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related