pandas to_sql postgresql with array dtypes

Question

I have a dataframe that holds datetime values and arrays and possibly other datatypes in the future. I wish to to_sql it to PostgreSQL where datetime is a (timestamp without time zone) and arrays are (byte) types, but I have no idea what to put for the dtype argument.

Is there a way to dynamically do the dtype based on the dataframe column's datatype?

How the table looks:

CREATE TABLE IF NOT EXISTS data2_mcw_tmp (
      time timestamp without time zone,
      u bytea,
      v bytea,
      w bytea,
      spd bytea,
      dir bytea,
      temp bytea
);

My code so far (after from help from user rftr):

dtypedict = {}
Data2_mcw_conv = Data2_mcw.copy()

for row in Data2_mcw_conv.index:
  for col in Data2_mcw_conv.columns:

    value = Data2_mcw_conv[col][row]

    try:
      if type(Data2_mcw_conv[col].iloc[0]).__module__ == np.__name__:

        dtypedict.update({col:BYTEA})

        value = Data2_mcw_conv[col].loc[row]

        print('before: ')
        print (value.flags)
        print('---------------------')
          
        value = value.copy(order='C')

        print('after: ')
        print (value.flags)
        print('=====================')
          
        value = pickle.dumps(value)

    except:    
      if isinstance(Data2_mcw_conv[col].iloc[0], datetime.date):

        dtypedict.update({col:TIMESTAMP})
        
    Data2_mcw_conv[col][row] = value

Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn, 
                    if_exists = 'replace',
                    dtype=dtypedict)

However, I get this error:

Traceback (most recent call last):
  File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 145, in <module>
    postgres_conv()
  File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 124, in postgres_conv
    Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn,
  File "C:\Python38\lib\site-packages\pandas\core\generic.py", line 2778, in to_sql
    sql.to_sql(
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 590, in to_sql
    pandas_sql.to_sql(
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 1397, in to_sql
    table.insert(chunksize, method=method)
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 831, in insert
    exec_insert(conn, keys, chunk_iter)
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 748, in _execute_insert
    conn.execute(self.table.insert(), data)
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1286, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "C:\Python38\lib\site-packages\sqlalchemy\sql\elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1478, in _execute_clauseelement
    ret = self._execute_context(
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1842, in _execute_context
    self._handle_dbapi_exception(
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 2027, in _handle_dbapi_exception
    util.raise_(exc_info[1], with_traceback=exc_info[2])
  File "C:\Python38\lib\site-packages\sqlalchemy\util\compat.py", line 207, in raise_
    raise exception
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1779, in _execute_context
    self.dialect.do_executemany(
  File "C:\Python38\lib\site-packages\sqlalchemy\dialects\postgresql\psycopg2.py", line 951, in do_executemany
    context._psycopg2_fetched_rows = xtras.execute_values(
  File "C:\Python38\lib\site-packages\psycopg2\extras.py", line 1267, in execute_values
    parts.append(cur.mogrify(template, args))
ValueError: ndarray is not C-contiguous

value.flag outputs before/after value = value.copy(order='C'):

before: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

---------------------
after: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

=====================

Why is this error occurring and any idea how to solve it?

rftr · Accepted Answer · 2021-10-29 07:04:47Z

2

Follow the second code snippet of this answer and customize the dtypes to their PostgreSQL equivalents from here. So in your case, e.g.:

from sqlalchemy.dialects.postgresql import BYTEA, TIMESTAMP

def sqlcol(dfparam):
    # ...
    if "datetime" in str(j):
        dtypedict.update({i: TIMESTAMP})
    if "object" in str(j):  # Depending on what your other column's datatypes are
        dtypesdict.update({i: BYTEA})
    # ...

Notes:

According to the docs, TIMESTAMP is without timezone by default.
Your (byte) columns are usually represented with datatype object in pandas. You should take this into consideration, if you add further data in the future.

answered Oct 29, 2021 at 7:04

rftr

1,2752 gold badges16 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

xyiong Over a year ago

Thanks for the tips, I took in your advice and made some progress, but I encounter an error when I try to port arrays in, I've added the error and current code to my original question

rftr Over a year ago

It seems like your arrays have a different order than C. So before adding your arrays to your resulting dataframe, try converting them with arr = arr.copy(order='C') with arr being your respective array.

xyiong Over a year ago

Hi, thanks for the advice! I just tried that, but it still gives the same error. does it have anything to do with me doing a pickle.dump of the array? I need to do it as BYTEA is a datatype for binary data, not exactly arrays and there are no other ways to put numpy arrays into the postgresql database that I am aware of.

rftr Over a year ago

Can you show the result with print(value.flags) after the line value = value(order='C') please?

xyiong Over a year ago

I have added the output for before/after the value=value(order='C") line

|

Collectives™ on Stack Overflow

pandas to_sql postgresql with array dtypes

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related