1

I am writing a standard script where I will fetch the data from database, do some manipulation and insert data back into another table.

I am facing an overflow issue while converting a column's type in Dataframe. Here's an example :

import numpy as np
import pandas as pd

d = {'col1': ['66666666666666666666666666666']}
df = pd.DataFrame(data=d)
df['col1'] = df['col1'].astype('int64')

print(df)

Error :

Traceback (most recent call last):
  File "HelloWorld.py", line 6, in <module>
    df['col1'] = df['col1'].astype('int64')
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 5548, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py", line 604, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py", line 409, in apply
    applied = getattr(b, f)(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/blocks.py", line 595, in astype
    values = astype_nansafe(vals1d, dtype, copy=True)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/cast.py", line 974, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  File "pandas/_libs/lib.pyx", line 615, in pandas._libs.lib.astype_intsafe
OverflowError: Python int too large to convert to C long

I cannot control the values inside d['col1'] because in the actual code it is being generated by another function. How can I solve this problem if I want to keep the final data type as 'int64'.

I was thinking to catch the exception and then assign the largest int64 value to the whole column but then the rows of the column which are not overflowing might also lead to inconsistent results.

Can you advise me on some elegant solutions here?

2
  • What about using float instead of int? In this way you get power of 10 numbers Commented May 5, 2022 at 16:16
  • @AndreaIerardi good idea, as per the requirement,we need int64 -i know its not very flexible Commented May 5, 2022 at 16:28

1 Answer 1

2

With your idea, you can use np.iinfo

ii64 = np.iinfo(np.int64)

df['col1'] = df['col1'].astype('float128').clip(ii64.min, ii64.max).astype('int64')
print(df)

# Output
                  col1
0  9223372036854775807

Take care of the limit of float128 too :-D

>>> np.finfo(np.float128)
finfo(resolution=1e-18, min=-1.189731495357231765e+4932, max=1.189731495357231765e+4932, dtype=float128)

>>> np.iinfo('int64')
iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.