Regex expression to replace a string containing a substring with the substring only in a DataFrame

Question

I am trying to replace all strings within a Python dataframe column that contain a certain substring, with only the substring itself. Preferably it would be an 'inplace=True' sort of result.

I've tried various regex expressions, unfortunately as I'm new to this, everything I have tried has not yielded the desired result. I am on Python 3.7.3.

I think the code I need to conduct the replacement within the dataframe is

df.replace(to_replace = regex expression that identifies substring in string containing the substring , value = 'substring', regex = True). So below is an example of what I'm trying to do

#original dataframe
import pandas as pd

df = pd.DataFrame({'brand':['brand1 & brand2','brand1/brand3','brand4 brand3','brand1 and brand 6']})
df

    brand
0   brand1 & brand2
1   brand6
2   brand1/brand3
3   brand9
4   brand4 brand3
5   brand8
6   brand1 and brand6

#desired result

df

    brand
0   brand1
1   brand6
2   brand1
3   brand9
4   brand4 brand3
5   brand8
6   brand1

So far, my regex expressions have effected no change. Just as a note, the brand names don't actually include 1-9, to avoid any possible confusion. The actual df I'm manipulating has a little over 10k rows, but within the column 'brands' strings that contain brand1 comprise about 2k of the 10k, and I need to replace all of the strings containing brand1 with just 'brand1' alone.

the data you put with pd.DataFrame({'brand':['brand1 & brand2','brand1/brand3','brand4 brand3','brand1 and brand 6']}) and the data you have shown as input don't match. Also is it now clear what you are trying to replace with what. — moys
– moys, Commented Oct 13, 2019 at 5:49
it should match now. In terms of what I'm trying to replace, the example shows that any rows containing brand1, I want to replace those strings with just brand 1 alone. so row 0 originally is literally the string 'brand1 & brand2' and I want to replace it with just 'brand1'. And so on for the other rows. — uncrazimatic
– uncrazimatic, Commented Oct 13, 2019 at 5:53
So, what is going on with row 4? why don't it just become brand4? — moys
– moys, Commented Oct 13, 2019 at 5:54
I need to leave that row as is. Basically, all rows that don't have brand1 somewhere in the string, need to be left alone. Only rows with brand1 would be processed with the regex. — uncrazimatic
– uncrazimatic, Commented Oct 13, 2019 at 5:56

moys · Accepted Answer · 2019-10-13 06:07:41Z

1

Use:

df['brand'] = np.where(df['brand'].str.contains('brand1'), 'brand1',df['brand'])

Input

    brand
0   brand1 & brand2
1   brand6
2   brand1/brand3
3   brand9
4   brand4 brand3
5   brand1 and brand 6

Output

    brand
0   brand1
1   brand6
2   brand1
3   brand9
4   brand4 brand3
5   brand1

answered Oct 13, 2019 at 6:07

moys

8,1173 gold badges19 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

uncrazimatic Over a year ago

That did it. Thanks. I'll accept when it lets me. I'm a bit surprised I didn't need regex...I'll have to read up on np.where. Thank you.

moys Over a year ago

Glad to help. It is always better to show input & expected output & explain what you are trying to achieve. This helps others to suggest various ways of achieving it & some of those methods may be more efficient than the one we have in mind :-)

Karl Knechtel Over a year ago

Could you explain some more about how this works? What's special in particular about brand1 in the code, seeing as the output produces other brand<num> values?

moys Over a year ago

There is nothing special with brand1. In np.where, you can do something when the condition is true & when the condition is false. what i have done is told that when the string contains brand1 , make the value of the cell as brand1, else keep the content as in column brand

moys Over a year ago

If the solution helped you, consider upvoting/accepting the answer.

Collectives™ on Stack Overflow

Regex expression to replace a string containing a substring with the substring only in a DataFrame

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related