0

I am relatively new to regex and I am trying to replace part of the string inside of the string column in Pandas DataFrame. The challenge is that I have multiple string types that I want to remove from my column while keeping the rest of the string.

I have code working for 1 type of string, but when I try to use for loop, the code is not working. I am not sure how to specify iterator inside of the regex expression.

Here is code that works when applied to 1 type of sub-string:

df = pd.DataFrame({'A': ['ba ca t', 'foo', 'bait'],'B': ['abc', 'bar', 'xyz']})
df
df=df.replace({'A': r'^ba ca'}, {'A': ''}, regex=True)
df

Here is code that is not working when I try to us For Loop:

df = pd.DataFrame({'A': ['ba ca t', 'foo', 'bait'],'B': ['abc', 'bar', 'xyz']})
replace_list=['ba ca','foo']
for i in replace_list:
    df=df.replace({'A': r'^(i)'}, {'A': ''}, regex=True)
df

I would like to iterate over a list of strings to remove them from a column in the DataFrame.

2
  • Are you trying to remove those patterns (i.e., replace with empty string '')? Commented Jun 20, 2019 at 21:38
  • yes, I am trying to remove them (replace with empty string). Commented Jun 20, 2019 at 21:41

2 Answers 2

3

'^(i)' is not the correct method of performing string interpolation. You're looking for something along the lines of f-string formatting (rf'^{i}') or str.format (r'^{}'.format(i)).

Although a better solution here would be to ditch the loop, since replace allows you to perform multiple replacements at once.

df.replace({'A': replace_list}, '', regex=True)

      A    B
0     t  abc
1        bar
2  bait  xyz

Or, with str.replace:

df['A'].str.replace('|'.join(replace_list), '')

0       t
1        
2    bait
Name: A, dtype: object

This post by me should also be worth a read: What is the difference between Series.replace and Series.str.replace?

Sign up to request clarification or add additional context in comments.

Comments

2

Since you wan't i to modify your regex pattern, you should consider this change:

 df=df.replace({'A': r'^({})'.format(i)}, {'A': ''}, regex=True)

Output

+----+-------+-----+
|    |  A    |  B  |
+----+-------+-----+
| 0  | t     | abc |
| 1  |       | bar |
| 2  | bait  | xyz |
+----+-------+-----+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.