0

I have such a data frame:

#v1   v2    v3    v4   v5
 a    b     b     c    1 1 2 2 2 3 3 3 3 4 4 4 4 4 4 ...
....

As you see, the v5 column contains word id. And I have a list of word id to remove:

toRve = ['1','3','5'.....]

And I write a for loop to remove the word id in list:

for i in toRve:
    df[v5] = df[v5].str.replace("{0} ".format(i), "")

But I got this result:

 #v1   v2    v3    v4   v5
  a    b     b     c    222444444 ...
....

As the 22 23has been regarded 2+2+23 so it has been changed to 223. Do you have any good idea to solve this problem? Thank you in advance!

Why all the space has gone? Could you help me? Thank you in advance!

1
  • @jezrael Thanks for reply. But it will cause lots of redundant spaces and 22 will change to 2 either. I want exactly i (2) to be removed. Commented Feb 17, 2016 at 13:15

1 Answer 1

1

You can use apply method to run a function for every element:

import pandas as pd

s = pd.Series([
        "1 1 2 2 3 3 4 4 5 5 6 6 6",
        "3 4 2 1 2 3 4 4 5 5 4 34 2"
    ])

todel = set(["1", "3", "5"])
s.apply(lambda x:" ".join(v for v in x.strip().split() if v not in todel))

the output:

0       2 2 4 4 6 6 6
1    4 2 2 4 4 4 34 2
dtype: object
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer. It seems very promising! But how to make the space before the first character removed? It will affect the training of my model. Thanks a lot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.