1

I already referred this post, this post, this post. So, please don't mark as duplicate

I have a raw data in pandas dataframe which is called temp_id as shown below.

The column is of datatype float64 because of NA's and it looks like as shown below in jupyter notebook

temp_id
55608.0
55609.0
NaN        
55610.0
NaN        
55611.0

In csv file, the same column looks like as shown below

temp_id
55608
55609
        #empty row indicating NA        
55610
        #empty row indicating NA
55611

Am trying to copy this data into a postgresql table with below table definition. Please note that it is not primary key and can have empty rows

CREATE TABLE temp(
  temp_id integer
  
);

When I try to copy the data, I get the below error

ERROR:  invalid input syntax for integer: "55608.0"
CONTEXT:  COPY temp, line 2, column temp_id: "55608.0"

How can I avoid this and insert this data into a integer column in Postgresql table? The below are the miscellaneous characters that I give in pgadmin during import csv

enter image description here

7
  • 2
    Have you tried casting the column to integer via .astype(int) before inserting? The error seems to indicate there's an issue with the decimal point in the floats Commented Jul 15, 2021 at 11:11
  • If I try to cast it in pandas, it throws error because the column has na values. So cannot convert to int. Commented Jul 15, 2021 at 11:13
  • DataFrame columns can mix types, but the dtype will be read as "object". Replace None values with the string "NULL" which is used in SQL databases in place of NaN. Then, filter for your numerical values and cast them to int using astype. Commented Jul 15, 2021 at 11:15
  • @LukasThaler - I get this error ValueError: Cannot convert non-finite values (NA or inf) to integer Commented Jul 15, 2021 at 11:15
  • 1
    Because either you or another process may be using numpy or pandas. NaN is a value in float, but int values do not have NaN. Said another way: Trying to build a column with 1, 2, 3.1, the computer will assume the entire column is a float. Since NaN ony exists in floats, then the whole column becomes one. Commented Jul 15, 2021 at 11:20

2 Answers 2

1

The column you are trying to insert contains NaN (or None). One interesting thing to know is that the floating-point type does have a special value for NaN but integers do not. So when reading the csv file, the computer (pandas module) assumes that the whole column is made out of floating-point numbers.

a = [1, 2, 3.01] # Will be float when read by Pandas.
b = [1, 2, None] # Will be float when read by Pandas.

Solutions

  1. Remove rows with NaN & set to int
import pandas as pd

df = pd.DataFrame(dict(col=[1, 2, 3, 4, None]))
df = df.dropna()
df = df.astype(int)
  1. Some SQL databases use "NULL" to represent NaN, but it must be sent as a string. In the database, the column will be of int, but it must be set to "nullable".
import pandas as pd

df = pd.DataFrame(dict(col=[1, 2, 3, 4, None]))
# Note that Pandas accept mixed type columns. The column dtype will be of "Object".
df = df.fillna('NULL')
df = df.astype(str)
df['col'] = df['col'].apply(lambda x: x.replace('.0', ''))
Sign up to request clarification or add additional context in comments.

Comments

0

The answer is similar to what @Lukas Thaler posted. But instead of astype(int), I had to use astype(Int64)

df['temp_id'] = df['temp_id'].astype('Int64')

This worked well to convert columns with NA to Int64 type and I was able to upload successfully.

So, it's not int but int64.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.