Is there a multiple column map function for dataframes?

Question

In Pandas, How can one column be derived from multiple other columns?

For example, lets say I wanted to annotate my dataset with the correct form of address for each subject. Perhaps to label some plots with -- so I can tell who the results are for.

Take a dataset:

data = [('male', 'Homer', 'Simpson'), ('female', 'Marge', 'Simpson'), ('male', 'Bart', 'Simpson'),('female', 'Lisa', 'Simpson'),('infant', 'Maggie', 'Simpson')]
people = pd.DataFrame(data, columns=["gender", "first_name", "last_name"])

So we have:

   gender first_name last_name
0    male      Homer   Simpson
1  female      Marge   Simpson
2    male       Bart   Simpson
3  female       Lisa   Simpson
4  infant     Maggie   Simpson

And a function, which I want to apply to each row, storing the result into a new column.

def get_address(gender, first, last):
    title=""
    if gender=='male':
        title='Mr'
    elif gender=='female':
        title='Ms'

    if title=='':
        return first + ' '+ last
    else:
        return title + ' ' + first[0] + '. ' + last

Currently my method is:

people['address'] = map(lambda row: get_address(*row),people.get_values())



   gender first_name last_name         address
0    male      Homer   Simpson   Mr H. Simpson
1  female      Marge   Simpson   Ms M. Simpson
2    male       Bart   Simpson   Mr B. Simpson
3  female       Lisa   Simpson   Ms L. Simpson
4  infant     Maggie   Simpson  Maggie Simpson

Which works, but it is not elegant. It also feels bad converting to a unindexed list, then assigning back into a indexed column.

You can use apply with the axis=1 argument to apply by row. — BrenBarn
– BrenBarn, Commented Aug 1, 2014 at 5:48
That seems to be the answer. Would you like to make it one so I can accept it? — Frames Catherine White
– Frames Catherine White, Commented Aug 1, 2014 at 6:13

ZJS · Accepted Answer · 2014-08-02 20:43:34Z

What you are looking for is apply(func,axis=1) This will apply a function row wise through your dataframe.

In your example modify your method get_address to...

def get_address(row):#row is a pandas series with col names as indexes
    title=""
    gender = row['gender']     #extract gender from pandas series
    first = row['first_name']  #extract firstname from pandas series
    second = row['last_name']  #extract lastname from pandas series

    if gender=='male':
        title='Mr'
    elif gender=='female':
        title='Ms'

    if title=='':
        return first + ' '+ last
    else:
        return title + ' ' + first[0] + '. ' + last

then call people.apply(get_address,axis=1) which returns a new column (Actually this is a pandas series, with the correct indexes, which is how the dataframe knows how to add it as a column correctly) to add it to your dataframe add this code...

people['address'] = people.apply(get_address,axis=1)

Phillip Cloud · Accepted Answer · 2014-08-01 17:57:38Z

1

You can do this without any explicit looping:

In [70]: df
Out[70]:
   gender first_name last_name
0    male      Homer   Simpson
1  female      Marge   Simpson
2    male       Bart   Simpson
3  female       Lisa   Simpson
4  infant     Maggie   Simpson

In [71]: title = df.gender.replace({'male': 'Mr', 'female': 'Ms', 'infant': ''})

In [72]: initial = np.where(df.gender != 'infant', df.first_name.str[0] + '. ', df.first_name + ' ')
In [73]: initial
Out[73]: array(['H. ', 'M. ', 'B. ', 'L. ', 'Maggie '], dtype=object)

In [74]: address = (title + ' ' + Series(initial) + df.last_name).str.strip()

In [75]: address
Out[75]:
0     Mr H. Simpson
1     Ms M. Simpson
2     Mr B. Simpson
3     Ms L. Simpson
4    Maggie Simpson
dtype: object

Check out the documentation for Series.str methods, they're pretty rad. Most methods from str are implemented in addition to goodies like extract.

edited Aug 1, 2014 at 17:57

answered Aug 1, 2014 at 17:51

Phillip Cloud

25.8k12 gold badges72 silver badges91 bronze badges

6 Comments

Frames Catherine White Over a year ago

The String manipulation was just an example. While knowing about these string methods is good to know, it doesn't help me with my actual problem, that can not be done with concatenation. (My actual problem involves parsing strings into lists, then checking for presents of a 1 or 0 in one list and if so marking the cosponsoring element in the other list with a asterix, but I didn't want to put that in my example and it is long and harder to follow. I suspect I could do something with the str methods, but i think it would be even more hard to follow

Phillip Cloud Over a year ago

The more general apply will be slower. It's better to find a way to vectorize the operations. When you have more data a general apply will not scale very well especially the row by row version since each row is converted to a series of uniform type which if you have mixed types will be very annoying to use and inefficient.

Phillip Cloud Over a year ago

You should post our original problem which I think can be solved with isin and where

Frames Catherine White Over a year ago

For your reference String operations are not really very vectorisable (because they don't come up against the BLAS libraries). the strings functions you mention appear to be largely be implemented with for-loops. github.com/pydata/pandas/blob/master/pandas/core/strings.py They are more readable though.

Phillip Cloud Over a year ago

Actually most of those are implemented in Cython which speeds up loops considerably. By vectorization I simply meant applying operations on whole sequences rather than single elements at a time, which is unrelated to the use of BLAS. What I'm saying is that spending a bit of time trying to avoid apply will probably yield reusable and more performant code.

|

Collectives™ on Stack Overflow

Is there a multiple column map function for dataframes?

2 Answers 2

Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related