7

I have a csv file that contains information like

name    salary  department
a        2500      x
b        5000      y
c        10000      y
d        20000      x 

I need to convert this using Pandas to the form like

dept    name    position
x        a       Normal Employee
x        b       Normal Employee
y        c       Experienced Employee
y        d       Experienced Employee

if the salary <=8000 Position is Normal Employee

if the salary >8000 && <=25000 Position is Experienced Employee

My default code for group by

import csv
import pandas
pandas.set_option('display.max_rows', 999)
data_df = pandas.read_csv('employeedetails.csv')
#print(data_df.columns)
t = data_df.groupby(['dept'])
print t

What are the changes i need to make in this code to get the output that i mentioned above

3 Answers 3

8

I would use a simple function like:

def f(x):
    if x <= 8000:
        x = 'Normal Employee'
    elif 8000 < x <= 25000:
        x = 'Experienced Employee'
    return x

and then apply it to the df:

df['position'] = df['salary'].apply(f)
Sign up to request clarification or add additional context in comments.

1 Comment

apply will be slow for a large df
7

You could define 2 masks and pass these to np.where:

In [91]:
normal = df['salary'] <= 8000
experienced = (df['salary'] > 8000) & (df['salary'] <= 25000)
df['position'] = np.where(normal, 'normal emplyee', np.where(experienced, 'experienced employee', 'unknown'))
df

Out[91]:
  name  salary department              position
0    a    2500          x        normal emplyee
1    b    5000          y        normal emplyee
2    c   10000          y  experienced employee
3    d   20000          x  experienced employee

Or slightly more readable is to pass them to loc:

In [92]:
df.loc[normal, 'position'] = 'normal employee'
df.loc[experienced,'position'] = 'experienced employee'
df

Out[92]:
  name  salary department              position
0    a    2500          x       normal employee
1    b    5000          y       normal employee
2    c   10000          y  experienced employee
3    d   20000          x  experienced employee

2 Comments

how to get the count of normal employee when we use groupby
Can you explain what you mean? Are you after df.groupby('position').count()?
2

A useful function is apply:

data_df['position'] = data_df['salary'].apply(lambda salary: 'Normal Employee' if salary <= 8000 else 'Experienced Employee', axis=1)

This applies the lambda function to every element in the salary column.

1 Comment

apply will be slow for a large df

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.