1

I am looking for a better way to replace all column values with some other value.

What I currently have is this:

gender_text = ['undefined', 'male', 'female']

df.loc[df['gender'] == 0, 'gender'] = gender_text[0]
df.loc[df['gender'] == 1, 'gender'] = gender_text[1]
df.loc[df['gender'] == 2, 'gender'] = gender_text[2]

df.head()

I was hoping for something a bit more elegant and use the gender value (0, 1 or 2) as the index to choose from in gender_text to have everything fit in one line.

3 Answers 3

2

You can use a dictionary.

import pandas as pd
df = pd.DataFrame({'gender':[0,0,2,1,1,2]})

gender_text = {0:'undefined', 1:'male', 2:'female'}
df['gender'].map(gender_text)

# Out[33]: 
# 0    undefined
# 1    undefined
# 2       female
# 3         male
# 4         male
# 5       female
# Name: gender, dtype: object

Alternatively, you can also pd.merge, which might be better for larger datasets.

import pandas as pd
df = pd.DataFrame({'gender':[0,0,2,1,1,2]})
df_map = pd.DataFrame({'gender': [0, 1, 2], 'gender_new': ['undefined', 'male', 'female']})

df['gender'] = df.merge(df_map, on=['gender'])['gender_new']
Sign up to request clarification or add additional context in comments.

Comments

2

You can define a dict

replace_values = {0 :'undefined', 1 : 'male', 2 : 'female'}

And replace multiple values using replace

df = df.replace({"gender": replace_values}) 

Alternatively, replace each value in the column using

df.gender = df.gender.replace(0, 'undefined')
df.gender = df.gender.replace(1, 'male')
df.gender = df.gender.replace(2, 'female')

Comments

2

This is one of the usecase of the map function (use np.select for much faster performance)-

gender_text  = {0 :'undefined', 1 : 'male', 2 : 'female'}
df['gender'] = df['gender'].map(gender_text)

Or you can use apply -

df['gender'] =  df['gender'].apply(lambda x :  gender_text[x])

Or you can use np.select

condlist = [df['gender'] == 0,
            df['gender'] == 1,
            df['gender'] == 2]

choicelist = ['undefined',
              'male',
              'female']
df['gender'] = np.select(condlist, choicelist)

Performace Comparison. — >

%timeit df['gender'] = df['gender'].map(gender_text)
411 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df['gender'] = np.select(condlist,choicelist)
101 µs ± 322 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.