3

I want to replace every string in my pandas df column departments with None if it contains a )

         departments   var1   var1.1
   1      transport     aa      uu
   2      industry)     bb      tt
   3      aviation)     cc      tt

how the dataset should look like

         departments   var1    var2
   1      transport     aa      uu
   2      None          bb      tt
   3      None          cc      tt

A similar solution is here: Replacing regex pattern with another string works, but replacing with NONE replaces all values

How can i transform it to base python as i dont use spark?

df.withColumn("departments", when(col("departments").rlike("\)"), None)
          .otherwise(col("departments"))
      )
2
  • so is the above df shown a pandas df or a spark df? Commented Jun 7, 2021 at 17:18
  • my df is a pandas df Commented Jun 7, 2021 at 17:27

1 Answer 1

4

With your shown samples, please try following. You could use str.contains function to find out whatever values in departments column has ) then using .loc with respect to values which we got in m variable setting None to those values.

m = df['departments'].str.contains('\)', na=False)
df.loc[m,'departments'] = None
Sign up to request clarification or add additional context in comments.

2 Comments

ValueError: Cannot mask with non-boolean array containing NA / NaN values. The missing values will need to be explicitly filled with True or False prior to using the array as a mask.
@id345678, sure, I have edited my answer, kindly take a look to it and let me know how it goes then.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.