Replace substrings in specific rows

Question

For the following DataFrame

my_cols = ["a", "b", "c"]       
df2 = pd.DataFrame([["1a", "2a", "3a"], ["4aa", "5a", "6a"], ["7a", "8a", "9a"],
                    ["1a", "2a", "3a"], ["4a", "5a", "6a"], ["7a", "8a", "9a"]],
                   columns=my_cols)

df2:
    a   b   c
0  1a  2a  3a
1  4a  5a  6a
2  7a  8a  9a
3  1a  2a  3a
4  4a  5a  6a
5  7a  8a  9a

I want to evaluate if at any row a value contains the substring 4a. In that case I want to reassing in the whole row any a by b

my_str = "4a"
for x in range(df2.shape[0]):
    if my_str in df2["a"][x]:
        for y in range(len(my_cols)):
            df2[my_cols[y]][x] = df2[my_cols[y]][x].replace("a","b")

df2:
    a   b   c
0  1a  2a  3a
1  4ba  5b  6b
2  7a  8a  9a
3  1a  2a  3a
4  4b  5b  6b
5  7a  8a  9a

This method seems too inefficient, because of the multiple loops and the assignment done by replace(). Are there some built-in methods that could do the job? Any improvement will be appreciated.

Molessia · Accepted Answer · 2020-05-25 10:51:11Z

1

A possible solution is the following:

my_cols = ["a", "b", "c"]       
df2 = pd.DataFrame([["1a", "2a", "3a"], ["4aa", "5a", "6a"], ["7a", "8a", "9a"],
                    ["1a", "2a", "3a"], ["4a", "5a", "6a"], ["7a", "8a", "9a"]],
                   columns=my_cols)

mask = df2.apply(lambda row: row.astype(str).str.contains('4a').any(), axis=1)
df2.loc[df2[mask].index, df2.columns] = df2[mask].replace({'a': 'b'}, regex=True)

df2:
     a   b   c
0   1a  2a  3a
1  4bb  5b  6b
2   7a  8a  9a
3   1a  2a  3a
4   4b  5b  6b
5   7a  8a  9a

First, we create a mask that identifies all the rows in which at least one column contains the substring '4a'. Then, we update those rows with a copy of the rows in which we have replaced every 'a' with 'b'.

answered May 25, 2020 at 10:51

Molessia

4931 gold badge4 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ironzionlion · Accepted Answer · 2020-05-25 12:30:48Z

0

Thanks to @yatu and @Alessia Mondolo contribution, that would be the answer:

m = df2["a"].str.contains(my_str, na=False)
df2[m] = df2[m].replace({'a': 'b'}, regex=True)

answered May 25, 2020 at 12:30

ironzionlion

8991 gold badge9 silver badges28 bronze badges

Collectives™ on Stack Overflow

Replace substrings in specific rows

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related