2

I have the following pandas dataframe.

df = pd.DataFrame({'Neighborhood': ['Marble Hill', 'Chelsea', 'Sutton Place'],
                   'Venue Category': ['Hospital', 'Bridge', 'School']})

When I execute it, I get the following table.

 Neighborhood Venue Category
0 Marble Hill Hospital
1 Chelsea Bridge
2 Sutton Place School

Now, I want to assign numerical values for each Venue Category.

Hospital - 5 marks
School - 4 marks
Bridge - 2 marks

So I tried to assign marks using this code. I want to display the marks in a separate column.

def df2(df):

    if (df['Venue Category'] == 'Hospital'):
        return 5
    elif (df['Venue Category'] == 'School'):
        return 4
    elif (df['Venue Category'] != 'Hospital' or df['Venue Category'] != 'School'):
        return np.nan
df['Value'] = df.apply(df2, axis = 1)

Once executed, it gives me the following warning. May I know how to fix this please?

/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
1

1 Answer 1

1

Create dictionary for all possible Venue Category and then use Series.map, if some value from column not exist in keys of dictionary is returned NaN:

df = pd.DataFrame({'Neighborhood': ['Marble Hill', 'Chelsea', 'Sutton Place', 'aaa'],
                   'Venue Category': ['Hospital', 'Bridge', 'School', 'a']})

print (df)
   Neighborhood Venue Category
0   Marble Hill       Hospital
1       Chelsea         Bridge
2  Sutton Place         School
3           aaa              a

d = {'Hospital':5, 'School':4, 'Bridge':2}
df['Value'] = df['Venue Category'].map(d)
print (df)
   Neighborhood Venue Category  Value
0   Marble Hill       Hospital    5.0
1       Chelsea         Bridge    2.0
2  Sutton Place         School    4.0
3           aaa              a    NaN

Solution with np.select is possible, but in my opinion overcomplicated:

conditions = [df['Venue Category'] == 'Hospital',
              df['Venue Category'] == 'School',
              df['Venue Category'] == 'Bridge']
choices = [5,4,3]
df['Value'] = np.select(conditions, choices, default=np.nan)

print (df)
   Neighborhood Venue Category  Value
0   Marble Hill       Hospital    5.0
1       Chelsea         Bridge    3.0
2  Sutton Place         School    4.0
3           aaa              a    NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.