Python3 Pandas - handle overflow when casting to number greater than data type int64

Question

I am writing a standard script where I will fetch the data from database, do some manipulation and insert data back into another table.

I am facing an overflow issue while converting a column's type in Dataframe. Here's an example :

import numpy as np
import pandas as pd

d = {'col1': ['66666666666666666666666666666']}
df = pd.DataFrame(data=d)
df['col1'] = df['col1'].astype('int64')

print(df)

Error :

Traceback (most recent call last):
  File "HelloWorld.py", line 6, in <module>
    df['col1'] = df['col1'].astype('int64')
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 5548, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py", line 604, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py", line 409, in apply
    applied = getattr(b, f)(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/blocks.py", line 595, in astype
    values = astype_nansafe(vals1d, dtype, copy=True)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/cast.py", line 974, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  File "pandas/_libs/lib.pyx", line 615, in pandas._libs.lib.astype_intsafe
OverflowError: Python int too large to convert to C long

I cannot control the values inside d['col1'] because in the actual code it is being generated by another function. How can I solve this problem if I want to keep the final data type as 'int64'.

I was thinking to catch the exception and then assign the largest int64 value to the whole column but then the rows of the column which are not overflowing might also lead to inconsistent results.

Can you advise me on some elegant solutions here?

What about using float instead of int? In this way you get power of 10 numbers — Andrea Ierardi
– Andrea Ierardi, Commented May 5, 2022 at 16:16
@AndreaIerardi good idea, as per the requirement,we need int64 -i know its not very flexible — Divyanshu Jimmy
– Divyanshu Jimmy, Commented May 5, 2022 at 16:28

Corralien · Accepted Answer · 2022-05-05 16:36:24Z

2

With your idea, you can use np.iinfo

ii64 = np.iinfo(np.int64)

df['col1'] = df['col1'].astype('float128').clip(ii64.min, ii64.max).astype('int64')
print(df)

# Output
                  col1
0  9223372036854775807

Take care of the limit of float128 too :-D

>>> np.finfo(np.float128)
finfo(resolution=1e-18, min=-1.189731495357231765e+4932, max=1.189731495357231765e+4932, dtype=float128)

>>> np.iinfo('int64')
iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

answered May 5, 2022 at 16:36

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python3 Pandas - handle overflow when casting to number greater than data type int64

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related