Remove/replace columns values based on another columns using pandas

Question

I have a data frame like this:

df
col1     col2      col3
 ab       1        prab
 cd       2        cdff
 ef       3        eef

I want to remove col1 values from the col3 values

the final data frame should look like<

df
col1     col2      col3
 ab       1        pr
 cd       2        ff
 ef       3        e

How to do it using pandas in most effective way ?

Possible duplicate of Remove substring from column based on another column — bharatk
– bharatk, Commented Jul 25, 2019 at 11:40

Erfan · Accepted Answer · 2019-07-25 11:39:23Z

2

Use .apply with replace over axis=1:

df['col3'] = df.apply(lambda x: x['col3'].replace(x['col1'], ''), axis=1)

Output

  col1  col2 col3
0   ab     1   pr
1   cd     2   ff
2   ef     3    e

answered Jul 25, 2019 at 11:39

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Quang Hoang · Accepted Answer · 2019-07-25 13:20:59Z

1

It looks like a loop is unavoidable since you have to work with replacing/removing substrings. In that case, list comprehension might come in handy:

%%timeit
df.apply(lambda x: x['col3'].replace(x['col1'], ''), axis=1)

# 767 µs ± 24.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

while

%%timeit
[a.replace(b,'') for a,b in zip(df['col3'], df['col1'])]

# 24.4 µs ± 3.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

answered Jul 25, 2019 at 13:20

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

IQbrod · Accepted Answer · 2019-07-25 11:36:20Z

0

Suppose df is a matrix :

df = [["ab",1,"prab"],["cd",2,"cdff"],["ef",3,"eef"]]

You want to remove the key (col1) in each value (col3) for each row :

for row in df:
  row[2] = row[2].replace(row[0],"")

Following this documentation each occurence of col1 is replaced by an empty string: "".

answered Jul 25, 2019 at 11:36

IQbrod

2,3031 gold badge12 silver badges31 bronze badges