0

I am converting Pandas commands into Spark ones. I bumped into wanting to convert this line into Apache Spark code:

This line replaces every two spaces into one.

df = df.columns.str.replace('  ', ' ')

Is it possible to replace a string from all columns using Spark? I came into this, but it is not quite right.

df = df.withColumnRenamed('--', '-')

To be clear I want this

//+---+----------------------+-----+
//|id |address__test         |state|
//+---+----------------------+-----+

to this

//+---+----------------------+-----+
//|id |address_test          |state|
//+---+----------------------+-----+

2 Answers 2

2

You can apply the replace method on all columns by iterating over them and then selecting, like so:

df = spark.createDataFrame([(1, 2, 3)], "id: int, address__test: int, state: int")
df.show()
+---+-------------+-----+
| id|address__test|state|
+---+-------------+-----+
|  1|            2|    3|
+---+-------------+-----+

from pyspark.sql.functions import col

new_cols = [col(c).alias(c.replace("__", "_")) for c in df.columns]
df.select(*new_cols).show()
+---+------------+-----+
| id|address_test|state|
+---+------------+-----+
|  1|           2|    3|
+---+------------+-----+


On the sidenote: calling withColumnRenamed makes Spark create a Projection for each distinct call, while a select makes just single Projection, hence for large number of columns, select will be much faster.

Sign up to request clarification or add additional context in comments.

Comments

1

Here's a suggestion.

We get all the target columns:

columns_to_edit = [col for col in df.columns if "__" in col]

Then we use a for loop to edit them all one by one:

for column in columns_to_edit:
    new_column = column.replace("__", "_")
    df = df.withColumnRenamed(column, new_column)

Would this solve your issue?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.