Remove string from pandas column dependent on another column

Question

I have an example dataframe:

      col1                                   col2  
0     Hello, is it me you're looking for     Hello   
1     Hello, is it me you're looking for     me 
2     Hello, is it me you're looking for     looking 
3     Hello, is it me you're looking for     for   
4     Hello, is it me you're looking for     Lionel  
5     Hello, is it me you're looking for     Richie

I would like to change col1 so that it removed the string in col2, and return the ammended dataframe. I would also like to remove the characters 1 before and 1 after the string, for example, the desired output for index 1 would be:

      col 1                                   col 2
1     Hello, is ityou're looking for          me

I have tried using pd.apply(), pd.map() with a .replace() function, but I can't get the .replace() to use pd.['col2'] as an argument. I also feel as if it isn't the best way to go about it.

Any help? I'm mostly new to pandas and am looking to learn, so please ELI5.

Thanks!

Can you show us your code? How close have you got?

Paula Livingstone
– Paula Livingstone

2017-11-19 13:17:34 +00:00
Commented Nov 19, 2017 at 13:17 — Paula Livingstone
– Paula Livingstone, Commented Nov 19, 2017 at 13:17

bjornasm · Accepted Answer · 2020-03-17 17:43:37Z

4

Do some function for each row in dataframe can use:

df.apply(func, axis=1)

func will get each row as series as argument.

df['col1'] = df.apply(lambda row: row['col1'].replace(row['col2'],''))

However, removing one character before and after needs more work.

so define func:

def func(row):
    c1 = row['col1'] #string col1
    c2 = row['col2'] #string col2
    find_index = c1.find(c2) #first find c2 index from left
    if find_index == -1: # not find
        return c1 #not change
    else:
        start_index = max(find_index - 1, 0) #1 before but not negative
        end_index = find_index + len(c2) +1 #1 after, python will handle index overflow
        return c1.replace(c1[start_index:end_index], '') #remove

then:

df['col1'] = df.apply(func, axis=1)

*to avoid copy warning, use:

df = df.assign(col1=df.apply(func, axis=1))

edited Mar 17, 2020 at 17:43

bjornasm

2,3487 gold badges42 silver badges66 bronze badges

answered Nov 19, 2017 at 13:26

SCKU

83310 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Magellan88 · Accepted Answer · 2017-11-19 13:25:35Z

3

My guess is, that you were missing the "axis=1" so the apply works not on the column but on the row

A = """Hello, is it me you're looking for;Hello
Hello, is it me you're looking for;me
Hello, is it me you're looking for;looking
Hello, is it me you're looking for;for
Hello, is it me you're looking for;Lionel
Hello, is it me you're looking for;Richie
"""
df = pd.DataFrame([a.split(";") for a in A.split("\n") ][:-1],
                   columns=["col1","col2"])

df.col1 = df.apply( lambda x: x.col1.replace( x.col2, "" )  , axis=1)

answered Nov 19, 2017 at 13:25

Magellan88

2,5734 gold badges30 silver badges37 bronze badges

Comments

mucktruckpluckduck · Accepted Answer · 2020-04-30 10:41:02Z

Perhaps there is a more pythonic or elegant way, but here is how I quickly did above. This will work best if you don't have you need flexibility to manipulate the strings and where speed to fix is more important than performance.

I took out the columns of dataframe as two individual series

col1Series = df['col1']
col2Series = df['col2']

Next create an empty list to store final string value:

rowxList = []

Iterate as follows to populate the list:

for x,y in zip(col1Series,col2Series):
    rowx  = x.replace(y,'')
    rowxList.append(rowx)

Last, put the rowxList back in the original dataframe as a new column. You can replace the old column. It's safer to do that under a new column and check the output against the original two columns and then remove the old column you no longer need:

df['newCol'] = rowxList

Collectives™ on Stack Overflow

Remove string from pandas column dependent on another column

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related