How to define user defined function in pandas

Question

I have a csv file that contains information like

name    salary  department
a        2500      x
b        5000      y
c        10000      y
d        20000      x

I need to convert this using Pandas to the form like

dept    name    position
x        a       Normal Employee
x        b       Normal Employee
y        c       Experienced Employee
y        d       Experienced Employee

if the salary <=8000 Position is Normal Employee

if the salary >8000 && <=25000 Position is Experienced Employee

My default code for group by

import csv
import pandas
pandas.set_option('display.max_rows', 999)
data_df = pandas.read_csv('employeedetails.csv')
#print(data_df.columns)
t = data_df.groupby(['dept'])
print t

What are the changes i need to make in this code to get the output that i mentioned above

Fabio Lamanna · Accepted Answer · 2016-02-15 16:52:06Z

8

I would use a simple function like:

def f(x):
    if x <= 8000:
        x = 'Normal Employee'
    elif 8000 < x <= 25000:
        x = 'Experienced Employee'
    return x

and then apply it to the df:

df['position'] = df['salary'].apply(f)

answered Feb 15, 2016 at 16:52

Fabio Lamanna

21.7k24 gold badges95 silver badges126 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

EdChum Over a year ago

apply will be slow for a large df

EdChum · Accepted Answer · 2016-02-15 16:49:38Z

7

You could define 2 masks and pass these to np.where:

In [91]:
normal = df['salary'] <= 8000
experienced = (df['salary'] > 8000) & (df['salary'] <= 25000)
df['position'] = np.where(normal, 'normal emplyee', np.where(experienced, 'experienced employee', 'unknown'))
df

Out[91]:
  name  salary department              position
0    a    2500          x        normal emplyee
1    b    5000          y        normal emplyee
2    c   10000          y  experienced employee
3    d   20000          x  experienced employee

Or slightly more readable is to pass them to loc:

In [92]:
df.loc[normal, 'position'] = 'normal employee'
df.loc[experienced,'position'] = 'experienced employee'
df

Out[92]:
  name  salary department              position
0    a    2500          x       normal employee
1    b    5000          y       normal employee
2    c   10000          y  experienced employee
3    d   20000          x  experienced employee

answered Feb 15, 2016 at 16:49

EdChum

397k204 gold badges836 silver badges583 bronze badges

2 Comments

Edwin Baby Over a year ago

how to get the count of normal employee when we use groupby

EdChum Over a year ago

Can you explain what you mean? Are you after df.groupby('position').count()?

IanS · Accepted Answer · 2016-02-15 16:50:52Z

2

A useful function is apply:

data_df['position'] = data_df['salary'].apply(lambda salary: 'Normal Employee' if salary <= 8000 else 'Experienced Employee', axis=1)

This applies the lambda function to every element in the salary column.

answered Feb 15, 2016 at 16:50

IanS

16.3k9 gold badges64 silver badges87 bronze badges

1 Comment

EdChum Over a year ago

apply will be slow for a large df

Collectives™ on Stack Overflow

How to define user defined function in pandas

3 Answers 3

1 Comment

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related