1

I'm trying to replace string in one the columns inside my dataframe(df). Here's what df looks like:

                           0                  1
0  2012 Black Toyota Corolla    White/Black/Red
1      2013 Red Toyota Camry    Red
2      2015 Blue Honda Civic    Blue
3         2012 Black Mazda 6    Black/Red/White
4   2011 White Nissan Maxima    White/Red/Black

Sometimes, column 1 has multiple color values, sometimes only a single value. I would like to take however many values there are in column 1, check if any of those exist in column 0 and remove that value from column 0.

I've tried approaching it this way.

    def removeColor(main,sub):
         for i in sub.split('/'):
                 main = main.str.replace(i, '')
         return(main)
>>> df['0'] = df['0'].map(lambda x: removeColor(x['0'],x['2']))

This results in a TypeError.

TypeError: string indices must be integers

My expected output looks like below:

                     0                  1
0  2012 Toyota Corolla    White/Black/Red
1    2013 Toyota Camry    Red
2     2015 Honda Civic    Blue
3         2012 Mazda 6    Black/Red/White
4   2011 Nissan Maxima    White/Red/Black
4
  • How does column 1 have more values in your expected output? Commented May 21, 2019 at 19:15
  • @Erfan, Sorry about that. I've fixed column 1. Commented May 21, 2019 at 19:18
  • 2
    df['0'].str.replace('|'.join(df.iloc[:,1].str.replace('/','|')),'') ? Commented May 21, 2019 at 19:20
  • The original csv that I'm using for the dataframe has the values 0,1 as strings. Referring to them as df[0] doesn't work. It could easily have been Title, Colors instead of 0,1. Commented May 21, 2019 at 19:20

2 Answers 2

1

map only works on a Series. In your lambda function, x would be a String (the value for column "0"), so when you do x["0"] and x["1"] it's trying to get the index from a String, hence your error.

The apply function lets you act on an entire row (or column) and would be better suited. Here's one way to accomplish what you're after:

import re

def remove_color(row):
    return re.sub(row.iloc[1].replace("/", "|"), "", row.iloc[0]).replace("  ", " ")


df.iloc[:, 0] = df.apply(remove_color, axis=1)

You could replace the iloc calls with specific column names to make it more readable (you mentioned col names could be anything so I'm giving a generic approach here).

The second replace call is to remove extra spaces that were left by the re.sub. You could modify your re.sub to do that on a single call, but it could get messy.

Sign up to request clarification or add additional context in comments.

2 Comments

I like this approach.
@JasonBourne Updated the answer including why your approach didn't work. Feel free to accept and/or upvote this answer if it solves your problem.
1
import pandas as pd

iLoc = pd.DataFrame({'0': ['2012 Black Toyota Corolla','2013 Red Toyota Camry','2015 Blue Honda Civic','2012 Black Mazda 6','2011 White Nissan Maxima'],'1': ['White/Black/Red','Red','Blue','Black/Red/White','White/Red/Black']})

display(iLoc)

def removeColor(main,sub):
    for i in range(len(main)):
        for j in str(sub[i]).split('/'):
            main[i] = main[i].replace(j, '').replace('  ',' ').strip()
    return main

iLoc["0"] = removeColor(iLoc["0"],iLoc["1"])

display(iLoc)

Your method was partially correct.
You need to extract value from the series and replace each main row with its substring row from same index

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.