0

I want to insert a symbol between two regex groups.

My code is as follows:

df = spark.createDataFrame([('ab',)], ['str'])
df = df.select(
  concat(
    regexp_extract('str', r'(\w)(\w)', 1),  # extract the first group
    lit(' '),                               # add symbol
    regexp_extract('str', r'(\w)(\w)', 2)   # add the second group
  ).alias('d')).collect()
print(df)

Is there any better way?

1 Answer 1

1

You can use regexp_replace with capture groups:

import pyspark.sql.functions as F

df.select(F.regexp_replace('str', r'(\w)(\w)', '$1 $2').alias('d')).show()
+---+
|  d|
+---+
|a b|
+---+
Sign up to request clarification or add additional context in comments.

1 Comment

thanks. I tried regexp_replace('str', r'(\w)(\w)', '\1 \2') with standard re group markers and it didn't work. Is use of $ in pyspark documented somewhere ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.