Regex pattern to remove numeric value from words in pyspark

Question

I am working on pyspark dataframe and I have a column of words (array<string> type). What should be the regex pattern to remove numeric values and numeric values from words?

+---+----------------------------------------------+
|id |    words                                     |
+---+----------------------------------------------+
|564|[fhbgtrj5, 345gjhg, ghth578ghu, 5897, fhrfu44]|
+---+----------------------------------------------+

expected output:

+---+----------------------------------------------+
|id |words                                         |
+---+----------------------------------------------+
|564|               [fhbgtrj, gjhg, ghthghu, fhrfu]|
+---+----------------------------------------------+

Please help.

Does this answer your question? Delete digits in Python (Regex) — jbflow
– jbflow, Commented Mar 25, 2021 at 22:27
@jbflow thanks for looking into it. the references you shared certainly removes numbers but another aim is to keep alphabets from alphanumeric — Samiksha
– Samiksha, Commented Mar 25, 2021 at 22:32

mck · Accepted Answer · 2021-03-26 09:16:27Z

1

You can use transform together with regexp_replace to remove the numbers, and use array_remove to remove the empty entries (which comes from those entries which only consist of numbers).

df2 = df.withColumn(
    'words', 
    F.expr("array_remove(transform(words, x -> regexp_replace(x, '[0-9]', '')), '') as words")
)

df2.show(truncate=False)
+---+-------------------------------+
|id |words                          |
+---+-------------------------------+
|564|[fhbgtrj, gjhg, ghthghu, fhrfu]|
+---+-------------------------------+

edited Mar 26, 2021 at 9:16

answered Mar 26, 2021 at 7:27

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Regex pattern to remove numeric value from words in pyspark

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related