2

I have a dataframe in pandas:

import pandas as pd
# assign data of lists.  
data = {'Gender': ['M', 'F', 'M', 'F','M', 'F','M', 'F','M', 'F','M', 'F'], 
        'Employment': ['R','U', 'E','R','U', 'E','R','U', 'E','R','U', 'E'],
        'Age': ['Y','M', 'O','Y','M', 'O','Y','M', 'O','Y','M', 'O']
       } 
    
# Create DataFrame  
df = pd.DataFrame(data)  
df

What I want is to create for each category of each existing column a new column with the following format:

Gender_M -> for when the gender equals M
Gender_F -> for when the gender equal F
Employment_R -> for when employment equals R
Employment_U -> for when employment equals U
and so on...

So far, I have created the below code:

for i in range(len(df.columns)):
    curent_column=list(df.columns)[i]
    col_df_array = df[curent_column].unique()
    
    for j in range(col_df_array.size):
        new_col_name = str(list(df.columns)[i])+"_"+col_df_array[j]
    
        for index,row in df.iterrows():
            if(row[curent_column] == col_df_array[j]):
                df[new_col_name] = row[curent_column]       

The problem is that even though I have managed to create successfully the column names, I am not able to get the correct column values.

For example the column Gender should be as below:

data2 = {'Gender': ['M', 'F', 'M', 'F','M', 'F','M', 'F','M', 'F','M', 'F'],
         'Gender_M': ['M', 'na', 'M', 'na','M', 'na','M', 'na','M', 'na','M', 'na'], 
         'Gender_F': ['na', 'F', 'na', 'F','na', 'F','na', 'F','na', 'F','na', 'F']
       } 

df2 = pd.DataFrame(data2)  

Just to say, the na can be anything such as blanks or dots or NAN.

1

3 Answers 3

2

You're looking for pd.get_dummies.

>>> pd.get_dummies(df)
    Gender_F  Gender_M  Employment_E  Employment_R  Employment_U  Age_M  Age_O  Age_Y
0          0         1             0             1             0      0      0      1
1          1         0             0             0             1      1      0      0
2          0         1             1             0             0      0      1      0
3          1         0             0             1             0      0      0      1
4          0         1             0             0             1      1      0      0
5          1         0             1             0             0      0      1      0
6          0         1             0             1             0      0      0      1
7          1         0             0             0             1      1      0      0
8          0         1             1             0             0      0      1      0
9          1         0             0             1             0      0      0      1
10         0         1             0             0             1      1      0      0
11         1         0             1             0             0      0      1      0
Sign up to request clarification or add additional context in comments.

Comments

1

If you are trying to get the data in a format like your df2 example, I believe this is what you are looking for.

ndf = pd.get_dummies(df)
df.join(ndf.mul(ndf.columns.str.split('_').str[-1]))

Output:

Old Answer

df[['Gender']].join(pd.get_dummies(df[['Gender']]).mul(df['Gender'],axis=0).replace('',np.NaN))

Output:

   Gender Gender_F Gender_M
0       M      NaN        M
1       F        F      NaN
2       M      NaN        M
3       F        F      NaN
4       M      NaN        M
5       F        F      NaN
6       M      NaN        M
7       F        F      NaN
8       M      NaN        M
9       F        F      NaN
10      M      NaN        M
11      F        F      NaN

Comments

0

If you are okay with 0s and 1s in your new columns, then using get_dummies (as suggested by @richardec) should be the most straightforward.

However, if want a specific letter in each of your new columns, then another method is to loop through the current columns and the specific categories within each column, and create a new column from this information using apply.

for col in data.keys():
    categories = list(df[col].unique())
    for category in categories:
        df[f"{col}_{category}"] = df[col].apply(lambda x: category if x==category else float("nan"))

Result:

>>> df
   Gender Employment Age Gender_M Gender_F Employment_R Employment_U Employment_E Age_Y Age_M Age_O
0       M          R   Y        M      NaN            R          NaN          NaN     Y   NaN   NaN
1       F          U   M      NaN        F          NaN            U          NaN   NaN     M   NaN
2       M          E   O        M      NaN          NaN          NaN            E   NaN   NaN     O
3       F          R   Y      NaN        F            R          NaN          NaN     Y   NaN   NaN
4       M          U   M        M      NaN          NaN            U          NaN   NaN     M   NaN
5       F          E   O      NaN        F          NaN          NaN            E   NaN   NaN     O
6       M          R   Y        M      NaN            R          NaN          NaN     Y   NaN   NaN
7       F          U   M      NaN        F          NaN            U          NaN   NaN     M   NaN
8       M          E   O        M      NaN          NaN          NaN            E   NaN   NaN     O
9       F          R   Y      NaN        F            R          NaN          NaN     Y   NaN   NaN
10      M          U   M        M      NaN          NaN            U          NaN   NaN     M   NaN
11      F          E   O      NaN        F          NaN          NaN            E   NaN   NaN     O

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.