How to replace multiple character in string of data frame in pandas?

Question

I have such a data frame:

#v1   v2    v3    v4   v5
 a    b     b     c    1 1 2 2 2 3 3 3 3 4 4 4 4 4 4 ...
....

As you see, the v5 column contains word id. And I have a list of word id to remove:

toRve = ['1','3','5'.....]

And I write a for loop to remove the word id in list:

for i in toRve:
    df[v5] = df[v5].str.replace("{0} ".format(i), "")

But I got this result:

 #v1   v2    v3    v4   v5
  a    b     b     c    222444444 ...
....

As the 22 23has been regarded 2+2+23 so it has been changed to 223. Do you have any good idea to solve this problem? Thank you in advance!

Why all the space has gone? Could you help me? Thank you in advance!

@jezrael Thanks for reply. But it will cause lots of redundant spaces and 22 will change to 2 either. I want exactly i (2) to be removed. — user5779223
– user5779223, Commented Feb 17, 2016 at 13:15

HYRY · Accepted Answer · 2016-02-17 22:32:55Z

1

You can use apply method to run a function for every element:

import pandas as pd

s = pd.Series([
        "1 1 2 2 3 3 4 4 5 5 6 6 6",
        "3 4 2 1 2 3 4 4 5 5 4 34 2"
    ])

todel = set(["1", "3", "5"])
s.apply(lambda x:" ".join(v for v in x.strip().split() if v not in todel))

the output:

0       2 2 4 4 6 6 6
1    4 2 2 4 4 4 34 2
dtype: object

edited Feb 17, 2016 at 22:32

answered Feb 17, 2016 at 14:20

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user5779223 Over a year ago

Thanks for your answer. It seems very promising! But how to make the space before the first character removed? It will affect the training of my model. Thanks a lot.

Collectives™ on Stack Overflow

How to replace multiple character in string of data frame in pandas?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related