pandas dataframe replace multiple substring of column

Question

I have below the code

import pandas as pd

df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})

pattern = ','.join(['$', ','])

df['A'] = df['A'].str.replace('$|,', '', regex=True)
print(df['A'])

What I am trying to remove every occurrence of '$' or ','... so I am trying to replace with blank..

But its replacing only ,

Output I am getting

it should be

What I am doing wrong

Any help appreciated

Thanks

Dani Mesejo · Accepted Answer · 2022-07-27 12:32:08Z

Use:

import pandas as pd

df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
df['A'] = df['A'].str.replace('[$,]', '', regex=True)
print(df)

Output

The problem is that the character $ has a special meaning in regular expressions. From the documentation (emphasis mine):

$
Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.

So you need to escape the character or put it inside a character class.

As an alternative use:

df['A'].str.replace('\$|,', '', regex=True)  # note the escaping \

mozway · Accepted Answer · 2022-07-27 12:35:18Z

2

If you only have integer-like numbers an easy option is to remove all but digits \D, then you don't have to deal with other special regex characters like $:

df['A'] = df['A'].str.replace(r'\D', '', regex=True)

output:

answered Jul 27, 2022 at 12:35

mozway

267k13 gold badges56 silver badges106 bronze badges

Comments

Sunderam Dubey · Accepted Answer · 2022-07-28 11:26:19Z

1

It might be useful for you:

import pandas as pd
df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
df['A'] = df['A'].str.replace('$', '', regex=True)
print(df['A'])

edited Jul 28, 2022 at 11:26

Sunderam Dubey

8,83512 gold badges25 silver badges43 bronze badges

answered Jul 27, 2022 at 12:37

Kashyap

414 bronze badges

Collectives™ on Stack Overflow

pandas dataframe replace multiple substring of column

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related