3

I have a pandas dataframe that looks like this:

COL

hi A/P_90890 how A/P_True A/P_/93290 are AP_wueiwo A/P_|iwoeu you A/P_?9028k ?
...
 Im  fine, what A/P_49 A/P_0.0309 about you?

The expected result should be:

COL

hi how are you?
...
Im fine, what about you?

How can I remove efficiently from a column and for the full pandas dataframe all the strings that have A/P_?

I tried with this regular expression:

A/P_(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+

However, I do not know if there's a more simpler or robust way of removing all those substrings from my dataframe. How can I remove all the strings that have A/P_ at the beginning?

UPDATE

I tried:

df_sess['COL'] = df_sess['COL'].str.replace(r'A/P(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', '')

And it works, however I would like to know if there's a more robust way of doing this. Possibily with a regular expression.

9
  • You want to remove A/P_1 and keep the rest of the string yes? Commented Jul 1, 2018 at 14:09
  • No, I want to remove the full string... in other words I want to remove all the strings that have A/P_, and let there the clean ones @Abhishek Commented Jul 1, 2018 at 14:10
  • so you want to delete the entire row where any column has A/P_? Commented Jul 1, 2018 at 14:11
  • Possible duplicate of How to drop rows from pandas data frame that contains a particular string in a particular column? Commented Jul 1, 2018 at 14:16
  • post a testable dataframe (with a few columns and rows) and expected result Commented Jul 1, 2018 at 14:16

3 Answers 3

3

one way could be to use \S* matching all non withespaces after A/P_ and also add \s to remove the whitespace after the string to remove, such as:

df_sess['COL'] = df_sess['col'].str.replace(r'A/P_\S*\s', '')

In you input, it seems there is an typo error (or at least I think so), so with this input:

df_sess = pd.DataFrame({'col':['hi A/P_90890 how A/P_True A/P_/93290 are A/P_wueiwo A/P_|iwoeu you A/P_?9028k ?',
                              'Im fine, what A/P_49 A/P_0.0309 about you?']})
print (df_sess['col'].str.replace(r'A/P_\S*\s', ''))
0            hi how are you ?
1    Im fine, what about you?
Name: col, dtype: object

you get the expected output

Sign up to request clarification or add additional context in comments.

2 Comments

I am still having a large sequence of spaces, any idea of how to delete them into a single space?
@anon you can .str.replace(r'\s+', ' ') after the replace(r'A/P_\S*\s', '') to select all the sequence composed of 1 whitespace or more ('\s+) and replace it by one whitespace.
2

How about:

(df['COL'].replace('A[/P|P][^ ]+', '', regex=True)
          .replace('\s+',' ', regex=True))

Full example:

import pandas as pd

df = pd.DataFrame({
    'COL': 
    ["hi A/P_90890 how A/P_True A/P_/93290 AP_wueiwo A/P_|iwoeu you A/P_?9028k ?",
    "Im  fine, what A/P_49 A/P_0.0309 about you?"]
})

df['COL'] = (df['COL'].replace('A[/P|P][^ ]+', '', regex=True)
                      .replace('\s+',' ', regex=True))

Returns (oh, there is an extra space before ?):

                        COL
0              hi how you ?
1  Im fine, what about you?

Comments

2

Because of pandas 0.23.0 bug in replace() function (https://github.com/pandas-dev/pandas/issues/21159) when trying to replace by regex pattern the error occurs:

df.COL.str.replace(regex_pat, '', regex=True)
...
--->
TypeError: Type aliases cannot be used with isinstance().

I would suggest to use pandas.Series.apply function with precompiled regex pattern:

In [1170]: df4 = pd.DataFrame({'COL': ['hi A/P_90890 how A/P_True A/P_/93290 are AP_wueiwo A/P_|iwoeu you A/P_?9028k ?', 'Im  fine, what A/P_49 A/P_0.0309 about you?']})

In [1171]: pat = re.compile(r'\s*A/?P_[^\s]*')

In [1172]: df4['COL']= df4.COL.apply(lambda x: pat.sub('', x))

In [1173]: df4
Out[1173]: 
                         COL
0           hi how are you ?
1  Im  fine, what about you?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.