2

I use pandas to mining data. I have a DataFrame - data:

   Age  Sex     Name 
0  28   male    Kirill
1  32   female  Alina
2  12   female  Sasha

I want to replace Sex to digit, instead male use 1, instead female - 0

I try to make in loops:

for i in range(data.Age.size()):
    if data.Sex[i]=='male'
        data.Sex[i]=1
    else:
        data.Sex[i]=0

But I get a SettingWithCopyWarning. How I can make it right?

1
  • What are you really trying to achieve? The new categorical datatype might actually serve you better, depending on your goals beyond this step. Commented Mar 18, 2016 at 14:17

3 Answers 3

5

You can pass a dict and call map:

In [21]:
sex = {'male':1, 'female':0}
df['Sex'] = df['Sex'].map(sex)
df

Out[21]:
   Age  Sex    Name
0   28    1  Kirill
1   32    0   Alina
2   12    0   Sasha

Or make 2 calls on the masked df:

In [25]:
df.loc[df['Sex']=='male','Sex'] = 1
df.loc[df['Sex']=='female','Sex'] = 0
df

Out[25]:
   Age Sex    Name
0   28   1  Kirill
1   32   0   Alina
2   12   0   Sasha

In general you should avoid looping over the df when there are vectorised solutions available, additionally it's not a good idea to mutate the container that you're iterating over as it can yield strange behaviour such as it sometimes works or it does not.

Sign up to request clarification or add additional context in comments.

Comments

2

You can use the replace method, which exactly does that:

data.replace({'male': 1,
              'female': 0})

   Age  Sex    Name
0   28    1  Kirill
1   32    0   Alina
2   12    0   Sasha

or

data.replace(["male", "female"], [0, 1])

equivalent to:

data.replace(to_replace=["male", "female"], values=[0, 1])

In that case, lists must have the same length.

Comments

0

To add to their method, if you don't want to explicitly define the dictionary yourself you can better automate the process which comes helpful if you have multiple unique values (i.e. 5+).

import numpy as np
import pandas as pd

sex = np.sort(df['Sex'].unique()) # extract unique values and sorts them alphabetically
sex_dict = dict(enumerate(sex )) # creates a dictionary from the array above
sex_dict= dict(zip(sex_dict.values(), sex_dict.keys())) # Corrects the dictionary

df['Sex'] = df['Sex'].map(sex_dict) # maps as described in the other answers.

Again, this is more for automating the process for large unique values within the array.

Orignal DataFrame

   Age     Sex    Name
0   28    male  Kirill
1   32  female   Alina
2   12  female   Sasha

Final Results

   Age  Sex    Name
0   28    1  Kirill
1   32    0   Alina
2   12    0   Sasha

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.