5

I have an example dataframe:

      col1                                   col2  
0     Hello, is it me you're looking for     Hello   
1     Hello, is it me you're looking for     me 
2     Hello, is it me you're looking for     looking 
3     Hello, is it me you're looking for     for   
4     Hello, is it me you're looking for     Lionel  
5     Hello, is it me you're looking for     Richie   

I would like to change col1 so that it removed the string in col2, and return the ammended dataframe. I would also like to remove the characters 1 before and 1 after the string, for example, the desired output for index 1 would be:

      col 1                                   col 2
1     Hello, is ityou're looking for          me

I have tried using pd.apply(), pd.map() with a .replace() function, but I can't get the .replace() to use pd.['col2'] as an argument. I also feel as if it isn't the best way to go about it.

Any help? I'm mostly new to pandas and am looking to learn, so please ELI5.

Thanks!

1
  • Can you show us your code? How close have you got? Commented Nov 19, 2017 at 13:17

3 Answers 3

4

Do some function for each row in dataframe can use:

df.apply(func, axis=1)

func will get each row as series as argument.

df['col1'] = df.apply(lambda row: row['col1'].replace(row['col2'],''))

However, removing one character before and after needs more work.

so define func:

def func(row):
    c1 = row['col1'] #string col1
    c2 = row['col2'] #string col2
    find_index = c1.find(c2) #first find c2 index from left
    if find_index == -1: # not find
        return c1 #not change
    else:
        start_index = max(find_index - 1, 0) #1 before but not negative
        end_index = find_index + len(c2) +1 #1 after, python will handle index overflow
        return c1.replace(c1[start_index:end_index], '') #remove

then:

df['col1'] = df.apply(func, axis=1)

*to avoid copy warning, use:

df = df.assign(col1=df.apply(func, axis=1))
Sign up to request clarification or add additional context in comments.

Comments

3

My guess is, that you were missing the "axis=1" so the apply works not on the column but on the row

A = """Hello, is it me you're looking for;Hello
Hello, is it me you're looking for;me
Hello, is it me you're looking for;looking
Hello, is it me you're looking for;for
Hello, is it me you're looking for;Lionel
Hello, is it me you're looking for;Richie
"""
df = pd.DataFrame([a.split(";") for a in A.split("\n") ][:-1],
                   columns=["col1","col2"])

df.col1 = df.apply( lambda x: x.col1.replace( x.col2, "" )  , axis=1)

Comments

0

Perhaps there is a more pythonic or elegant way, but here is how I quickly did above. This will work best if you don't have you need flexibility to manipulate the strings and where speed to fix is more important than performance.

I took out the columns of dataframe as two individual series

col1Series = df['col1']
col2Series = df['col2']

Next create an empty list to store final string value:

rowxList = []

Iterate as follows to populate the list:

for x,y in zip(col1Series,col2Series):
    rowx  = x.replace(y,'')
    rowxList.append(rowx)

Last, put the rowxList back in the original dataframe as a new column. You can replace the old column. It's safer to do that under a new column and check the output against the original two columns and then remove the old column you no longer need:

df['newCol'] = rowxList

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.