Unexpected 32-bit integer overflow in pandas/numpy int64 (python 3.6)

Question

Let me start with the example code:

import numpy
from pandas import DataFrame

a = DataFrame({"nums": [2233, -23160, -43608]})

a.nums = numpy.int64(a.nums)

print(a.nums ** 2)
print((a.nums ** 2).sum())

On my local machine, and other devs' machines, this works as expected and prints out:

0       4986289
1     536385600
2    1901657664
Name: nums, dtype: int64
2443029553

However, on our production server, we get:

0       4986289
1     536385600
2    1901657664
Name: nums, dtype: int64
-1851937743

Which is 32-bit integer overflow, despite it being an int64.

The production server is using the same versions of python, numpy, pandas, etc. It's a 64-bit Windows Server 2012 OS and everything reports 64-bit (e.g. python --version, sys.maxsize, plastform.architecture).

What could possibly be causing this?

Why don't you use regular Python integers that are capable of representing arbitrarily large numbers? — ForceBru
– ForceBru, Commented Apr 20, 2017 at 16:55
@ForceBru: They're slow, bulky, and cause weird breakages if you try to use object arrays full of integer objects. — user2357112
– user2357112, Commented Apr 20, 2017 at 17:03
What is the output of print((a.nums.values**2).sum(dtype=np.int64))? — Warren Weckesser
– Warren Weckesser, Commented Apr 20, 2017 at 17:22
@SeanKramer: I just started digging through the code and wound up in bottleneck. I think bottleneck is mishandling numpy.int64 on platforms where a C long is 32-bit, and Pandas is getting a check wrong in its attempts to compensate for bottleneck's error. — user2357112
– user2357112, Commented Apr 20, 2017 at 17:32

user2357112 · Accepted Answer · 2017-04-20 17:48:27Z

5

This is a bug in the bottleneck library, which Pandas uses if it's installed. In some circumstances, bottleneck.nansum incorrectly has 32-bit overflow behavior when called on 64-bit input.

I believe this is due to bottleneck using PyInt_FromLong even when long is 32-bit. I'm not sure why that even compiles, actually. There's an issue report on the bottleneck issue tracker, not yet fixed, as well as an issue report on the Pandas issue tracker, where they tried to compensate for Bottleneck's issue (but I think they turned off Bottleneck when it does work instead of when it doesn't).

answered Apr 20, 2017 at 17:48

user2357112

286k32 gold badges490 silver badges570 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Unexpected 32-bit integer overflow in pandas/numpy int64 (python 3.6)

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related