3

Let's say I have a numpy array of some integer type (say np.int64) and want to cast it to another type (say np.int8). How can I most effectively check if the operation is safe (preserving all values)?

There are two approaches I've come up with:

Approach 1: Use the type information

def is_safe(data, new_type):
    if np.can_cast(data, new_type):
        return True    # Handle the trivial allowed cases
    type_info = np.iinfo(new_type)
    return np.all((data >= type_info.min) & (data <= type_info.max))

Approach 2: Use np.can_cast on all items

def is_safe(data, new_type):
    if np.can_cast(data, new_type):
        return True    # Handle the trivial allowed cases
    return all(np.can_cast(item, new_type) for item in np.nditer(item)) 

Both of these approaches seem to be valid (and work for trivial cases) but are they correct and efficient? Is there another, better approach?

P.S. To complicate things further, np.can_cast(np.int8, np.uint64) returns False (naturally) so changing between signed and unsigned integers has to be checked somewhat separately.

1
  • 1
    I don't think the P.S. adds anything new or requires a special check; the fact that np.uint64 requires values >= 0 is not fundamentally different from np.int8 requiring values >= -128. Commented Jan 12, 2017 at 22:26

1 Answer 1

2

If you already know that the array is of a NumPy integer type, then the only check needed is that the values are within the range specified by min/max of the target integer range. This is a much simpler check than the generic can_cast, which has no a priori knowledge of the things it is fed. Consequently, can_cast takes longer. I tested this on casting integers 0-99 from np.int64 to np.int8.

So, while both approaches are correct, the first one is preferable if you know that data is a NumPy integer array.

>>> timeit.timeit("np.all((data >= type_info.min) & (data <= type_info.max))", setup="import numpy as np\ndata = np.array(range(100), dtype=np.int64)\ntype_info = np.iinfo(np.int8)")
6.745509549000417
>>> timeit.timeit("all(np.can_cast(item, np.uint8) for item in np.nditer(data))", setup="import numpy as np\ndata = np.array(range(100), dtype=np.int64)")
51.0065170609887

It is slightly faster (20% or so) to assign the min and max values to new variables:

type_info = np.iinfo(new_type)
a = type_info.min
b = type_info.max
return np.all((data >= a) & (data <= b))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.