modifying DataFrame in pandas

Question

I use pandas to mining data. I have a DataFrame - data:

   Age  Sex     Name 
0  28   male    Kirill
1  32   female  Alina
2  12   female  Sasha

I want to replace Sex to digit, instead male use 1, instead female - 0

I try to make in loops:

for i in range(data.Age.size()):
    if data.Sex[i]=='male'
        data.Sex[i]=1
    else:
        data.Sex[i]=0

But I get a SettingWithCopyWarning. How I can make it right?

What are you really trying to achieve? The new categorical datatype might actually serve you better, depending on your goals beyond this step. — Paul H
– Paul H, Commented Mar 18, 2016 at 14:17

jrjc · Accepted Answer · 2016-03-18 13:48:24Z

5

You can pass a dict and call map:

In [21]:
sex = {'male':1, 'female':0}
df['Sex'] = df['Sex'].map(sex)
df

Out[21]:
   Age  Sex    Name
0   28    1  Kirill
1   32    0   Alina
2   12    0   Sasha

Or make 2 calls on the masked df:

In [25]:
df.loc[df['Sex']=='male','Sex'] = 1
df.loc[df['Sex']=='female','Sex'] = 0
df

Out[25]:
   Age Sex    Name
0   28   1  Kirill
1   32   0   Alina
2   12   0   Sasha

In general you should avoid looping over the df when there are vectorised solutions available, additionally it's not a good idea to mutate the container that you're iterating over as it can yield strange behaviour such as it sometimes works or it does not.

edited Mar 18, 2016 at 13:48

jrjc

22.1k10 gold badges67 silver badges79 bronze badges

answered Mar 18, 2016 at 13:12

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jrjc · Accepted Answer · 2016-08-15 12:30:40Z

2

You can use the replace method, which exactly does that:

data.replace({'male': 1,
              'female': 0})

   Age  Sex    Name
0   28    1  Kirill
1   32    0   Alina
2   12    0   Sasha

or

data.replace(["male", "female"], [0, 1])

equivalent to:

data.replace(to_replace=["male", "female"], values=[0, 1])

In that case, lists must have the same length.

edited Aug 15, 2016 at 12:30

answered Mar 18, 2016 at 13:45

jrjc

22.1k10 gold badges67 silver badges79 bronze badges

Comments

Leb · Accepted Answer · 2016-03-18 13:41:43Z

To add to their method, if you don't want to explicitly define the dictionary yourself you can better automate the process which comes helpful if you have multiple unique values (i.e. 5+).

import numpy as np
import pandas as pd

sex = np.sort(df['Sex'].unique()) # extract unique values and sorts them alphabetically
sex_dict = dict(enumerate(sex )) # creates a dictionary from the array above
sex_dict= dict(zip(sex_dict.values(), sex_dict.keys())) # Corrects the dictionary

df['Sex'] = df['Sex'].map(sex_dict) # maps as described in the other answers.

Again, this is more for automating the process for large unique values within the array.

Orignal DataFrame

   Age     Sex    Name
0   28    male  Kirill
1   32  female   Alina
2   12  female   Sasha

Final Results

   Age  Sex    Name
0   28    1  Kirill
1   32    0   Alina
2   12    0   Sasha

Collectives™ on Stack Overflow

modifying DataFrame in pandas

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related