I am writing a standard script where I will fetch the data from database, do some manipulation and insert data back into another table.
I am facing an overflow issue while converting a column's type in Dataframe. Here's an example :
import numpy as np
import pandas as pd
d = {'col1': ['66666666666666666666666666666']}
df = pd.DataFrame(data=d)
df['col1'] = df['col1'].astype('int64')
print(df)
Error :
Traceback (most recent call last):
File "HelloWorld.py", line 6, in <module>
df['col1'] = df['col1'].astype('int64')
File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 5548, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py", line 604, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py", line 409, in apply
applied = getattr(b, f)(**kwargs)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/blocks.py", line 595, in astype
values = astype_nansafe(vals1d, dtype, copy=True)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/cast.py", line 974, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/_libs/lib.pyx", line 615, in pandas._libs.lib.astype_intsafe
OverflowError: Python int too large to convert to C long
I cannot control the values inside d['col1'] because in the actual code it is being generated by another function. How can I solve this problem if I want to keep the final data type as 'int64'.
I was thinking to catch the exception and then assign the largest int64 value to the whole column but then the rows of the column which are not overflowing might also lead to inconsistent results.
Can you advise me on some elegant solutions here?