Efficiently creating additional columns in a pandas DataFrame using .map()

Question

I am analyzing a data set that is similar in shape to the following example. I have two different types of data (abc data and xyz data):

   abc1  abc2  abc3  xyz1  xyz2  xyz3
0     1     2     2     2     1     2
1     2     1     1     2     1     1
2     2     2     1     2     2     2
3     1     2     1     1     1     1
4     1     1     2     1     2     1

I want to create a function that adds a categorizing column for each abc column that exists in the dataframe. Using lists of column names and a category mapping dictionary, I was able to get my desired result.

abc_columns = ['abc1', 'abc2', 'abc3']
xyz_columns = ['xyz1', 'xyz2', 'xyz3']
abc_category_columns = ['abc1_category', 'abc2_category', 'abc3_category']
categories = {1: 'Good', 2: 'Bad', 3: 'Ugly'}

for i in range(len(abc_category_columns)):
    df3[abc_category_columns[i]] = df3[abc_columns[i]].map(categories)

print df3

The end result:

   abc1  abc2  abc3  xyz1  xyz2  xyz3 abc1_category abc2_category abc3_category
0     1     2     2     2     1     2          Good           Bad           Bad
1     2     1     1     2     1     1           Bad          Good          Good
2     2     2     1     2     2     2           Bad           Bad          Good
3     1     2     1     1     1     1          Good           Bad          Good
4     1     1     2     1     2     1          Good          Good           Bad

While the for loop at the end works fine, I feel like I should be using Python's lambda function, but can't seem to figure it out.

Is there a more efficient way to map in a dynamic number of abc-type columns?

Andy Hayden · Accepted Answer · 2013-05-15 22:45:37Z

28

You can use applymap with the dictionary get method:

In [11]: df[abc_columns].applymap(categories.get)
Out[11]:
   abc1  abc2  abc3
0  Good   Bad   Bad
1   Bad  Good  Good
2   Bad   Bad  Good
3  Good   Bad  Good
4  Good  Good   Bad

And put this to the specified columns:

In [12]: abc_categories = map(lambda x: x + '_category', abc_columns)

In [13]: abc_categories
Out[13]: ['abc1_category', 'abc2_category', 'abc3_category']

In [14]: df[abc_categories] = df[abc_columns].applymap(categories.get)

Note: you can construct abc_columns relatively efficiently using a list comprehension:

abc_columns = [col for col in df.columns if str(col).startswith('abc')]

edited May 15, 2013 at 22:45

answered May 15, 2013 at 22:26

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

yoshiserry Over a year ago

@AndyHayden, what is the difference between .applymap on a dataframe and .map on a pandas dataframe?

Andy Hayden Over a year ago

@yoshiserry applymap does it to each cell, rather than each row/col.

yoshiserry Over a year ago

@AndyHayden I'm not sure what you mean, so ApplyMap applies the function to every cell (being every intersection of row and column) so basically across the entire dataframe. Whereas .map just does it for a single row or a single column?

Andy Hayden Over a year ago

@yoshiserry yup. (and .apply is basically the same as .map but you'll see it used more often.)

Collectives™ on Stack Overflow

Efficiently creating additional columns in a pandas DataFrame using .map()

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related