0

I've found some similar solutions, but none that accomplish exactly what I want to do. I have a set of key/value pairs that I want to use for string substitution. e.g.

val replacements = Map( "STREET" -> "ST", "STR" -> "ST")

I am reading a table into a dataframe, and I would like modify a column to replace all instances of the key in my map with their values. So in the above map, look at the "street" column and replace all values of "STREET" with "ST" and all values of "STR" with "ST" etc.

I've been looking at some foldLeft implementations, but haven't been able to finagle it into working.

A basic solution would be great, but an optimal solution would be something I could plug into a Column function that someone wrote that I was hoping to update. Specifically a line like this:

val CleanIt: Column = trim(regexp_replace(regexp_replace(regexp_replace(colName," OF "," ")," AT "," ")," AND "," "))
3
  • The reason for foldLeft is ? Commented Sep 17, 2018 at 20:42
  • @thebluephantom It doesn't/didn't have to be foldLeft, that was just suggested in literally every answer I found that was trying to do something similar. Commented Sep 17, 2018 at 22:20
  • Interesting. I will check it out. Thx Commented Sep 18, 2018 at 6:41

1 Answer 1

2

You can create this helper method that transforms a given column and a map of replacements into a new Column expression:

def withReplacements(column: Column, replacements: Map[String, String]): Column = 
  replacements.foldLeft[Column](column) {
    case (col, (from, to)) => regexp_replace(col, from, to)
  }

Then use it on your street column with your replacements map:

val result = df.withColumn("street", withReplacements($"street", replacements))

For example:

df.show()
// +------------+------+
// |      street|number|
// +------------+------+
// | Main STREET|     1|
// |Broadway STR|     2|
// |     1st Ave|     3|
// +------------+------+

result.show()
// +-----------+------+
// |     street|number|
// +-----------+------+
// |    Main ST|     1|
// |Broadway ST|     2|
// |    1st Ave|     3|
// +-----------+------+

NOTE: the keys in the map must be valid regular expressions. That means, for example, that if you want to replace the string "St." with "ST", you should use Map("St\\." -> "ST) (escaping the dot, which otherwise would be interpreted as regex's "any")

Sign up to request clarification or add additional context in comments.

1 Comment

Perfect! Works exactly as I needed. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.