2

I have below the code

import pandas as pd

df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})

pattern = ','.join(['$', ','])

df['A'] = df['A'].str.replace('$|,', '', regex=True)
print(df['A'])

What I am trying to remove every occurrence of '$' or ','... so I am trying to replace with blank..

But its replacing only ,

Output I am getting

0    $5756
1     3434
2      $45
3    1344$

it should be

0    5756
1     3434
2      45
3    1344

What I am doing wrong

Any help appreciated

Thanks

3 Answers 3

3

Use:

import pandas as pd

df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
df['A'] = df['A'].str.replace('[$,]', '', regex=True)
print(df)

Output

      A
0  5756
1  3434
2    45
3  1344

The problem is that the character $ has a special meaning in regular expressions. From the documentation (emphasis mine):

$
Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.

So you need to escape the character or put it inside a character class.

As an alternative use:

df['A'].str.replace('\$|,', '', regex=True)  # note the escaping \
Sign up to request clarification or add additional context in comments.

Comments

2

If you only have integer-like numbers an easy option is to remove all but digits \D, then you don't have to deal with other special regex characters like $:

df['A'] = df['A'].str.replace(r'\D', '', regex=True)

output:

      A
0  5756
1  3434
2    45
3  1344

Comments

1

It might be useful for you:

import pandas as pd
df = pd.DataFrame({'A': ['$5,756', '3434', '$45', '1,344']})
df['A'] = df['A'].str.replace('$', '', regex=True)
print(df['A'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.