In Pandas, How can one column be derived from multiple other columns?
For example, lets say I wanted to annotate my dataset with the correct form of address for each subject. Perhaps to label some plots with -- so I can tell who the results are for.
Take a dataset:
data = [('male', 'Homer', 'Simpson'), ('female', 'Marge', 'Simpson'), ('male', 'Bart', 'Simpson'),('female', 'Lisa', 'Simpson'),('infant', 'Maggie', 'Simpson')]
people = pd.DataFrame(data, columns=["gender", "first_name", "last_name"])
So we have:
gender first_name last_name
0 male Homer Simpson
1 female Marge Simpson
2 male Bart Simpson
3 female Lisa Simpson
4 infant Maggie Simpson
And a function, which I want to apply to each row, storing the result into a new column.
def get_address(gender, first, last):
title=""
if gender=='male':
title='Mr'
elif gender=='female':
title='Ms'
if title=='':
return first + ' '+ last
else:
return title + ' ' + first[0] + '. ' + last
Currently my method is:
people['address'] = map(lambda row: get_address(*row),people.get_values())
gender first_name last_name address
0 male Homer Simpson Mr H. Simpson
1 female Marge Simpson Ms M. Simpson
2 male Bart Simpson Mr B. Simpson
3 female Lisa Simpson Ms L. Simpson
4 infant Maggie Simpson Maggie Simpson
Which works, but it is not elegant. It also feels bad converting to a unindexed list, then assigning back into a indexed column.
applywith theaxis=1argument to apply by row.