Python: Get array indexes of quartiles

Question

I am using the following code to calculate the quartiles of a given data set:

#!/usr/bin/python

import numpy as np

series = [1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8]

p1 = 25
p2 = 50
p3 = 75

q1 = np.percentile(series,  p1)
q2 = np.percentile(series,  p2)
q3 = np.percentile(series,  p3)

print('percentile(' + str(p1) + '): ' + str(q1))
print('percentile(' + str(p2) + '): ' + str(q2))
print('percentile(' + str(p3) + '): ' + str(q3))

The percentile function returns the quartiles, however, I would also like to get the indexes which it used to mark the bounderies of the quartiles. Is there any way to do this?

Is the data always sorted? Or else, this question wouldn't make sense, unless I'm missing something. But if it is sorted, then you can directly calculate the index. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Mar 22, 2017 at 17:35

eqzx · Accepted Answer · 2017-03-22 17:58:57Z

1

Since the data is sorted, you could just use numpy.searchsorted to return the indices at which to insert the values to maintain sorted order. You can specify which 'side' to insert the values.

>>> np.searchsorted(series,q1)
1
>>> np.searchsorted(series,q1,side='right')
11
>>> np.searchsorted(series,q2)
1
>>> np.searchsorted(series,q3)
11
>>> np.searchsorted(series,q3,side='right')
13

answered Mar 22, 2017 at 17:58

eqzx

5,6394 gold badges42 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Schmuddi · Accepted Answer · 2017-03-22 18:08:04Z

Assuming that the data is always sorted (thanks @juanpa.arrivillaga), you can use the rank method from the Pandas Series class. rank() takes several arguments. One of them is pct:

pct : boolean, default False

Computes percentage rank of data

There are different ways of calculating the percentage rank. These methods are controlled by the argument method:

method : {‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}

You need the method "max":

max: highest rank in group

Let's look at the output of the rank() method with these parameters:

import numpy as np
import pandas as pd

series = [1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8]

S = pd.Series(series)
percentage_rank = S.rank(method="max", pct=True)
print(percentage_rank)

This gives you basically the percentile for every entry in the Series:

0     0.0625
1     0.6875
2     0.6875
3     0.6875
4     0.6875
5     0.6875
6     0.6875
7     0.6875
8     0.6875
9     0.6875
10    0.6875
11    0.8125
12    0.8125
13    0.8750
14    0.9375
15    1.0000
dtype: float64

In order to retrieve the index for the three percentiles, you look up the first element in the Series that has an equal or higher percentage rank than the percentile you're interested in. The index of that element is the index that you need.

index25 = S.index[percentage_rank >= 0.25][0]
index50 = S.index[percentage_rank >= 0.50][0]
index75 = S.index[percentage_rank >= 0.75][0]

print("25 percentile: index {}, value {}".format(index25, S[index25]))
print("50 percentile: index {}, value {}".format(index50, S[index50]))
print("75 percentile: index {}, value {}".format(index75, S[index75]))

This gives you the output:

25 percentile: index 1, value 2
50 percentile: index 1, value 2
75 percentile: index 11, value 5

Rose · Accepted Answer · 2017-03-22 17:35:18Z

-1

Try this:

import numpy as np
import pandas as pd
series = [1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8]
thresholds = [25,50,75]
output = pd.DataFrame([np.percentile(series,x) for x in thresholds], index = thresholds, columns = ['quartiles'])
output

By making it a dataframe, you can assign the index pretty easily.

answered Mar 22, 2017 at 17:35

Rose

342 bronze badges

2 Comments

juanpa.arrivillaga Over a year ago

I'm not sure how this answers the question... I'm not sure I understand the question though...

Rose Over a year ago

@juanpa.arrivillaga I assumed that the question was about structuring the output...

Collectives™ on Stack Overflow

Python: Get array indexes of quartiles

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related