0

Im trying to replace a matched string only - and nothing else within the column, with another value.

For example:

My name is GaryBrooks. 
The Partnertime series was good.

Match:

GaryBrooks
Partner time

Expected output:

My name is [TM="GaryBrooks"].
The [TM="Partner time"] series was good.

So far, ive done the following;

| trademarkname | tm_value | DESCRIPTION_TEXT |Compare|
------------------------------------------------------------
| GaryBrooks  | [TM="GaryBrooks"]| My name is GaryBrooks. |yes
| Partner time| [TM="Partner time"] |The Partnertime series was good.|yes

file['Compare'] = file.apply(lambda x: 'Yes' if x['trademarkname'] in x['DESCRIPTION_TEXT'] else 'No',axis=1)

I was successful until the match was found but not yet in replacing it. Im not sure if this is a regexp replace function or a for loop

Something like this is what I wanna do or think: WHEN "Compare" IS 'Yes' THEN regexp_replace("DESCRIPTION_TEXT", "trademarkname" (This is what has to be matched, "tm_value" (*this is what the string should be replaced with)

1
  • Partnertime and Partner time, is that intended? So the space could be ignored when you match? Commented Jul 8, 2023 at 5:35

1 Answer 1

1

Try with expr in withColumn and we are going to replace the matched value with tm_value data.

Example:

from pyspark.sql.functions import *
df = spark.createDataFrame([('GaryBrooks','[TM="GaryBrooks"]','My name is GaryBrooks.','yes'),('Partner time','[TM="Partner time"]','The Partnertime series was good.','yes')],['trademarkname','tm_value','DESCRIPTION_TEXT','Compare'])
df.withColumn("output", expr('regexp_replace(DESCRIPTION_TEXT,"(GaryBrooks|Partnertime)",tm_value)')).\
show(10,False)
#+-------------+-------------------+--------------------------------+-------+----------------------------------------+
#|trademarkname|tm_value           |DESCRIPTION_TEXT                |Compare|output                                  |
#+-------------+-------------------+--------------------------------+-------+----------------------------------------+
#|GaryBrooks   |[TM="GaryBrooks"]  |My name is GaryBrooks.          |yes    |My name is [TM="GaryBrooks"].           |
#|Partner time |[TM="Partner time"]|The Partnertime series was good.|yes    |The [TM="Partner time"] series was good.|
#+-------------+-------------------+--------------------------------+-------+----------------------------------------+
Sign up to request clarification or add additional context in comments.

2 Comments

What if there are more than a few tm values and trademark names, how would I do this? a dictionary?
you need to create a regex matching string and use it in the regexp_replace function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.