Broadcasting error when reclassifying numpy float array using vectorized function

Question

I want to evaluate each value in a 2D numpy float array if it falls within the min, max boundaries of a certain numerical class. Next, I want to reassign that value to the 'score' associated with that class.

E.g the class boundaries could be:

>>> class1 = (0, 1.5)
>>> class2 = (1.5, 2.5)
>>> class3 = (2.5, 3.5)

The class scores are:

>>> score1 = 0.75
>>> score2 = 0.50
>>> score3 = 0.25

Values outside any of the classes should default to e.g. 99.

I've tried the following, but run into a ValueError due to broadcasting.

>>> import numpy as np

>>> arr_f = (6-0)*np.random.random_sample((4,4)) + 0  # array of random floats


>>> def reclasser(x, classes, news):
>>>     compare = [x >= min and x < max for (min, max) in classes]
>>>     try:
>>>         return news[compare.index(True)
>>>     except Value Error:
>>>         return 99.0


>>> v_func = np.vectorize(reclasser)
>>> out = v_func(arr_f, [class1, class2, class3], [score1, score2, score3])

ValueError: operands could not be broadcast together with shapes (4,4) (4,2) (4,)

Any suggestions on why this error occurs and how to remediate would be most appreciated. Also, if I'm entirely on the wrong path using vectorized functions, I'd also be happy to hear that.

Jussi Nurminen · Accepted Answer · 2019-04-18 09:21:18Z

1

Try to first make the code work without using np.vectorize. The code above won't work even with a single float as first argument. You misspelled ValueError; also it's not a good idea to use min and max as variable names (they are Python functions). A fixed version of reclasser would be:

def reclasser(x, classes, news):
    compare = [min(cls) < x < max(cls) for cls in classes]
    try:
        return news[compare.index(True)]
    except ValueError:
        return 99.0

That said, I think using the reclasser and np.vectorize is unnecessarily complex. Instead, you could do something like:

# class -> score mapping as a dict
class_scores = {class1: score1, class2: score2, class3: score3}
# matrix of default scores
scores = 99 * np.ones(arr_f.shape)

for cls, score in class_scores.items():
    # see which array values belong into current class
    in_cls = np.logical_and(cls[0] < arr_f, arr_f < cls[1])
    # update scores for current class
    scores[np.where(in_cls)] = score

scores will then be an array of scores corresponding to the original data array.

answered Apr 18, 2019 at 9:21

Jussi Nurminen

2,4081 gold badge12 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Hans Roelofsen Over a year ago

Thanks, that works flawlessly! I had in mind using a dictionary to pair the ranges and scores, but had somehow thought that tuples, like lists, can't be a dictionary key.

Jussi Nurminen Over a year ago

The logic is that tuples are immutable, therefore hashing them makes more sense than for lists. Since they are hashable, they can be used as dict keys.

Collectives™ on Stack Overflow

Broadcasting error when reclassifying numpy float array using vectorized function

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related