0

I want to evaluate each value in a 2D numpy float array if it falls within the min, max boundaries of a certain numerical class. Next, I want to reassign that value to the 'score' associated with that class.

E.g the class boundaries could be:

>>> class1 = (0, 1.5)
>>> class2 = (1.5, 2.5)
>>> class3 = (2.5, 3.5)

The class scores are:

>>> score1 = 0.75
>>> score2 = 0.50
>>> score3 = 0.25

Values outside any of the classes should default to e.g. 99.

I've tried the following, but run into a ValueError due to broadcasting.

>>> import numpy as np

>>> arr_f = (6-0)*np.random.random_sample((4,4)) + 0  # array of random floats


>>> def reclasser(x, classes, news):
>>>     compare = [x >= min and x < max for (min, max) in classes]
>>>     try:
>>>         return news[compare.index(True)
>>>     except Value Error:
>>>         return 99.0


>>> v_func = np.vectorize(reclasser)
>>> out = v_func(arr_f, [class1, class2, class3], [score1, score2, score3])

ValueError: operands could not be broadcast together with shapes (4,4) (4,2) (4,) 

Any suggestions on why this error occurs and how to remediate would be most appreciated. Also, if I'm entirely on the wrong path using vectorized functions, I'd also be happy to hear that.

1 Answer 1

1

Try to first make the code work without using np.vectorize. The code above won't work even with a single float as first argument. You misspelled ValueError; also it's not a good idea to use min and max as variable names (they are Python functions). A fixed version of reclasser would be:

def reclasser(x, classes, news):
    compare = [min(cls) < x < max(cls) for cls in classes]
    try:
        return news[compare.index(True)]
    except ValueError:
        return 99.0

That said, I think using the reclasser and np.vectorize is unnecessarily complex. Instead, you could do something like:

# class -> score mapping as a dict
class_scores = {class1: score1, class2: score2, class3: score3}
# matrix of default scores
scores = 99 * np.ones(arr_f.shape)

for cls, score in class_scores.items():
    # see which array values belong into current class
    in_cls = np.logical_and(cls[0] < arr_f, arr_f < cls[1])
    # update scores for current class
    scores[np.where(in_cls)] = score

scores will then be an array of scores corresponding to the original data array.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, that works flawlessly! I had in mind using a dictionary to pair the ranges and scores, but had somehow thought that tuples, like lists, can't be a dictionary key.
The logic is that tuples are immutable, therefore hashing them makes more sense than for lists. Since they are hashable, they can be used as dict keys.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.