1

I would like to convert a numpy array into 5 classes: very low, low, average, high, very high; based on whether the values are -2 or more std. dev away from the mean of the array (for very low); -1 std. dev or more away from the mean (for low class); between -1 and +1 std. dev from the mean (for average); between +1 and +2 std. dev from mean (for high class) and greater than +2 std. dev. from the mean (for very high class).

I tried using stats.perentileofscore, but that does not give me what I want:

arr = np.random.rand(100)
[stats.percentileofscore(x, a, 'rank') for a in arr]
0

1 Answer 1

1

You can use pd.cut in Pandas.

sd = arr.std()
m = arr.mean()
>>> pd.cut(arr, [m - sd* 10000, m - sd * 2, m - sd, m + sd, m + sd *2, m + sd* 10000])
[(0.204, 0.785], (0.204, 0.785], (0.785, 1.0764], (0.785, 1.0764], (0.204, 0.785], ..., (0.204, 0.785], (0.204, 0.785], (-0.0875, 0.204], (0.204, 0.785], (0.785, 1.0764]]
Length: 100
Categories (5, object): [(-2909.105, -0.0875] < (-0.0875, 0.204] < (0.204, 0.785] < (0.785, 1.0764] < (1.0764, 2910.0944]]

To rename your categories:

buckets = (pd.Categorical(pd.cut(arr, 
               [m - sd * 10000, m - sd * 2, m - sd, m + sd, m + sd * 2, m + sd * 10000]))
           .rename_categories(['very low', 'low', 'average', 'high', 'very high']))

>>> buckets
[average, average, high, high, average, ..., average, average, low, average, high]
Length: 100
Categories (5, object): [very low, low, average, high, very high]
Sign up to request clarification or add additional context in comments.

1 Comment

thanks @Alexander, can I assign names to these categories?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.