Replace Number that falls Between Two Values (Pandas,Python3)

Question

Simple Question Here:

b = 8143.1795845088482
d = 14723.523658084257

My Df called final:

Words       score
This      90374.98788
is        80559.4495
a         43269.67002
sample    34535.01172
output    Very Low

I want to replace all the scores with either 'very low', 'low', 'medium', or 'high' based on whether they fall between quartile ranges.

something like this works:

final['score'][final['score'] <= b] = 'Very Low' #This is shown in the example above

but when I try to play this immediately after it doesn't work:

final['score'][final['score'] >= b] and final['score'][final['score'] <= d] = 'Low'

This gives me the error: cannot assign operator. Anyone know what I am missing?

EdChum · Accepted Answer · 2014-09-30 22:52:48Z

2

Firstly you must use the bitwise (e.g. &, | instead of and , or) operators as you are comparing arrays and therefore all the values and not a single value (it becomes ambiguoous to compare arrays like this plus you cannot override the global and operator to behave like you want), secondly you must use parentheses around multiple conditions due to operator precendence.

Finally you are performing chain indexing which may or may not work and will raise a warning, to set your column value use loc like this:

In [4]:

b = 25 
d = 50
final.loc[(final['score'] >= b) & (final['score'] <= d), 'score'] = 'Low'
final
Out[4]:
  Words score
0  This    10
1    is   Low
2   for   Low
3   You   704

edited Sep 30, 2014 at 22:52

answered Sep 30, 2014 at 22:45

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user3682157 Over a year ago

Hi Ed, this throws the following error: ValueError: Arrays were different lengths: 58 vs 1

EdChum Over a year ago

You'll have to edit valid input data and your code for me to reproduce your error. On your data you posted my code works fine as you can see.

user3682157 Over a year ago

updated, I haven't the slightest why this error is being thrown -- OP updated

user3682157 Over a year ago

it's being caused by the first replace line and the inclusion of the values 'Very Low' in the final df...just not sure how to get around this

EdChum Over a year ago

@user3682157 your problem here is that once you overwrite the value for the first quartile you effectively change the dtype to be a mixture of strings and ints/floats. The comparison now no longer works, it would be better to assign the string representations to a new column like unutbu suggests

unutbu · Accepted Answer · 2014-10-01 00:00:58Z

If your DataFrame's scores were all floats,

In [234]: df
Out[234]: 
    Words        score
0    This  90374.98788
1      is  80559.44950
2       a  43269.67002
3  sample  34535.01172

then you could use pd.qcut to categorize each value by its quartile:

In [236]: df['quartile'] = pd.qcut(df['score'], q=4, labels=['very low', 'low', 'medium', 'high'])

In [237]: df
Out[237]: 
    Words        score  quartile
0    This  90374.98788      high
1      is  80559.44950    medium
2       a  43269.67002       low
3  sample  34535.01172  very low

DataFrame columns have a dtype. When the values are all floats, then it has a float dtype, which can be very fast for numerical calculations. When the values are a mixture of floats and strings then the dtype is object, which mean each value is a Python object. While this gives the values a lot of flexibility, it is also very slow since every operation ultimately resorts back to calling a Python function instead of a NumPy/Panda C/Fortran/Cython function. Thus you should try to avoid mixing floats and strings in a single column.

Collectives™ on Stack Overflow

Replace Number that falls Between Two Values (Pandas,Python3)

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related