Pandas replace string using values from list

Question

I'm trying to replace string in one the columns inside my dataframe(df). Here's what df looks like:

                           0                  1
0  2012 Black Toyota Corolla    White/Black/Red
1      2013 Red Toyota Camry    Red
2      2015 Blue Honda Civic    Blue
3         2012 Black Mazda 6    Black/Red/White
4   2011 White Nissan Maxima    White/Red/Black

Sometimes, column 1 has multiple color values, sometimes only a single value. I would like to take however many values there are in column 1, check if any of those exist in column 0 and remove that value from column 0.

I've tried approaching it this way.

    def removeColor(main,sub):
         for i in sub.split('/'):
                 main = main.str.replace(i, '')
         return(main)

>>> df['0'] = df['0'].map(lambda x: removeColor(x['0'],x['2']))

This results in a TypeError.

TypeError: string indices must be integers

My expected output looks like below:

                     0                  1
0  2012 Toyota Corolla    White/Black/Red
1    2013 Toyota Camry    Red
2     2015 Honda Civic    Blue
3         2012 Mazda 6    Black/Red/White
4   2011 Nissan Maxima    White/Red/Black

df['0'].str.replace('|'.join(df.iloc[:,1].str.replace('/','|')),'') ? — BENY
– BENY, Commented May 21, 2019 at 19:20
The original csv that I'm using for the dataframe has the values 0,1 as strings. Referring to them as df[0] doesn't work. It could easily have been Title, Colors instead of 0,1. — Jason Bourne
– Jason Bourne, Commented May 21, 2019 at 19:20

aiguofer · Accepted Answer · 2019-05-21 20:08:05Z

1

map only works on a Series. In your lambda function, x would be a String (the value for column "0"), so when you do x["0"] and x["1"] it's trying to get the index from a String, hence your error.

The apply function lets you act on an entire row (or column) and would be better suited. Here's one way to accomplish what you're after:

import re

def remove_color(row):
    return re.sub(row.iloc[1].replace("/", "|"), "", row.iloc[0]).replace("  ", " ")


df.iloc[:, 0] = df.apply(remove_color, axis=1)

You could replace the iloc calls with specific column names to make it more readable (you mentioned col names could be anything so I'm giving a generic approach here).

The second replace call is to remove extra spaces that were left by the re.sub. You could modify your re.sub to do that on a single call, but it could get messy.

edited May 21, 2019 at 20:08

answered May 21, 2019 at 20:00

aiguofer

2,16522 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jason Bourne Over a year ago

I like this approach.

aiguofer Over a year ago

@JasonBourne Updated the answer including why your approach didn't work. Feel free to accept and/or upvote this answer if it solves your problem.

elvisytoob · Accepted Answer · 2019-05-21 21:00:13Z

1

import pandas as pd

iLoc = pd.DataFrame({'0': ['2012 Black Toyota Corolla','2013 Red Toyota Camry','2015 Blue Honda Civic','2012 Black Mazda 6','2011 White Nissan Maxima'],'1': ['White/Black/Red','Red','Blue','Black/Red/White','White/Red/Black']})

display(iLoc)

def removeColor(main,sub):
    for i in range(len(main)):
        for j in str(sub[i]).split('/'):
            main[i] = main[i].replace(j, '').replace('  ',' ').strip()
    return main

iLoc["0"] = removeColor(iLoc["0"],iLoc["1"])

display(iLoc)

Your method was partially correct.
You need to extract value from the series and replace each main row with its substring row from same index

answered May 21, 2019 at 21:00

elvisytoob

512 bronze badges

Collectives™ on Stack Overflow

Pandas replace string using values from list

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related