How to replace column values in a pandas dataframe with regex?

Question

I want to replace the the values in the column below with either 'ASUS' or 'ACER' (in caps) i.e. as long as there is the word (ignore case) 'acer' in the value, just replace it to 'ACER', and the word 'asus *' replace with 'ASUS'. I used below example screenshot from Pandas documentation as an example. I applied regex function and it doesn't seem to work - nothing happens at the output. My code:

dfx = pd.DataFrame({'Brands':['asus', 'ASUS ZEN', 'Acer','ACER Swift']})
dfx = dfx.replace([{'Brands': r'^asus.$'}, {'Brands': 'ASUS'}, {'Brands': r'^acer.$'}, {'Brands': 'ACER'}], regex=True)
dfx['Brands'].unique()

Output in Jupyter notebook:

array(['asus', 'ASUS ZEN', 'Acer', 'ACER Swift'], dtype=object)

Pandas documentation example used:

Pandas Link Here

Any help with a little explanation is very much appreciated.

ACCEPTED SOLUTION(S):

dfx = pd.DataFrame({'Brands':['asus', 'ASUS ZEN', 'Acer','ACER Swift']})

dfx['Brands'] =  dfx['Brands'].str.lower().str.replace('.*asus.*', 'ASUS', regex=True).str.replace('.*acer.*', 'ACER', regex=True)
OR
dfx['Brands'] = dfx.Brands.apply(lambda x: re.sub(r".*(asus|acer).*", lambda m: m.group(1).upper(), x, flags=re.IGNORECASE))

dfx['Brands'].unique()

Output:

array(['ASUS', 'ACER'], dtype=object)

Can you be more specific with the conditions that you are trying to meet? — imdevskp
– imdevskp, Commented Apr 19, 2021 at 7:52
the condition is that as long as there is 'acer' in the value, just replace it to 'ACER', samewise goes for 'asus' --> 'ASUS' — snow
– snow, Commented Apr 19, 2021 at 8:15

filippo · Accepted Answer · 2021-04-19 08:34:11Z

1

dfx.Brands.apply(lambda x: re.sub(r".*(asus|acer).*", lambda m: m.group(1).upper(), x, flags=re.IGNORECASE))

edited Apr 19, 2021 at 8:34

answered Apr 19, 2021 at 7:58

filippo

5,3044 gold badges23 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

snow Over a year ago

Hi, the output I got was not what I wanted: array(['ASUS', 'ASUS ZEN', 'ACER', 'ACER Swift'], dtype=object). Perhaps I wasn't being clear enough. How do I achieve just 'ASUS' and 'ACER' ?

filippo Over a year ago

@snow uh, ok, I understood you wanted to uppercase just the brand

filippo Over a year ago

@snow edited, now it should give you the expected output

snow Over a year ago

tried it! this works too. Output is correct. thanks !

imdevskp · Accepted Answer · 2021-04-19 08:28:15Z

0

Please try

dfx['Brands'] =  dfx['Brands'].str.lower().str.replace('.*asus.*', 'ASUS', regex=True).str.replace('.*acer.*', 'ACER', regex=True)

edited Apr 19, 2021 at 8:28

answered Apr 19, 2021 at 7:51

imdevskp

2,2532 gold badges13 silver badges23 bronze badges

4 Comments

snow Over a year ago

The code above helps to give the output I want but how do I approach it with regex?

imdevskp Over a year ago

The pattern inside .str.replace(). is a regular expression. By default regex=True inside .str.replace() pandas.pydata.org/docs/reference/api/…

snow Over a year ago

oh! This is new to me. Thanks ! By any chance do you know why the example approach in Pandas did not work for me?

imdevskp Over a year ago

If you want to explicitly mention it, you can mention it. Pandas says in the future default value of regex will be False. You may also see the edited solution.

Collectives™ on Stack Overflow

How to replace column values in a pandas dataframe with regex?

2 Answers 2

4 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related