0

I have a dataframe that holds datetime values and arrays and possibly other datatypes in the future. I wish to to_sql it to PostgreSQL where datetime is a (timestamp without time zone) and arrays are (byte) types, but I have no idea what to put for the dtype argument.

Is there a way to dynamically do the dtype based on the dataframe column's datatype?

How the table looks:

CREATE TABLE IF NOT EXISTS data2_mcw_tmp (
      time timestamp without time zone,
      u bytea,
      v bytea,
      w bytea,
      spd bytea,
      dir bytea,
      temp bytea
);

My code so far (after from help from user rftr):

dtypedict = {}
Data2_mcw_conv = Data2_mcw.copy()

for row in Data2_mcw_conv.index:
  for col in Data2_mcw_conv.columns:

    value = Data2_mcw_conv[col][row]

    try:
      if type(Data2_mcw_conv[col].iloc[0]).__module__ == np.__name__:

        dtypedict.update({col:BYTEA})

        value = Data2_mcw_conv[col].loc[row]

        print('before: ')
        print (value.flags)
        print('---------------------')
          
        value = value.copy(order='C')

        print('after: ')
        print (value.flags)
        print('=====================')
          
        value = pickle.dumps(value)

    except:    
      if isinstance(Data2_mcw_conv[col].iloc[0], datetime.date):

        dtypedict.update({col:TIMESTAMP})
        
    Data2_mcw_conv[col][row] = value

Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn, 
                    if_exists = 'replace',
                    dtype=dtypedict)

However, I get this error:

Traceback (most recent call last):
  File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 145, in <module>
    postgres_conv()
  File "C:\Users\myname\Desktop\database\pickletopdb2.py", line 124, in postgres_conv
    Data2_mcw_conv.to_sql(name='data2_mcw_tmp',con=conn,
  File "C:\Python38\lib\site-packages\pandas\core\generic.py", line 2778, in to_sql
    sql.to_sql(
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 590, in to_sql
    pandas_sql.to_sql(
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 1397, in to_sql
    table.insert(chunksize, method=method)
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 831, in insert
    exec_insert(conn, keys, chunk_iter)
  File "C:\Python38\lib\site-packages\pandas\io\sql.py", line 748, in _execute_insert
    conn.execute(self.table.insert(), data)
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1286, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "C:\Python38\lib\site-packages\sqlalchemy\sql\elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1478, in _execute_clauseelement
    ret = self._execute_context(
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1842, in _execute_context
    self._handle_dbapi_exception(
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 2027, in _handle_dbapi_exception
    util.raise_(exc_info[1], with_traceback=exc_info[2])
  File "C:\Python38\lib\site-packages\sqlalchemy\util\compat.py", line 207, in raise_
    raise exception
  File "C:\Python38\lib\site-packages\sqlalchemy\engine\base.py", line 1779, in _execute_context
    self.dialect.do_executemany(
  File "C:\Python38\lib\site-packages\sqlalchemy\dialects\postgresql\psycopg2.py", line 951, in do_executemany
    context._psycopg2_fetched_rows = xtras.execute_values(
  File "C:\Python38\lib\site-packages\psycopg2\extras.py", line 1267, in execute_values
    parts.append(cur.mogrify(template, args))
ValueError: ndarray is not C-contiguous

value.flag outputs before/after value = value.copy(order='C'):

before: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

---------------------
after: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

=====================

Why is this error occurring and any idea how to solve it?

1 Answer 1

2

Follow the second code snippet of this answer and customize the dtypes to their PostgreSQL equivalents from here. So in your case, e.g.:

from sqlalchemy.dialects.postgresql import BYTEA, TIMESTAMP

def sqlcol(dfparam):
    # ...
    if "datetime" in str(j):
        dtypedict.update({i: TIMESTAMP})
    if "object" in str(j):  # Depending on what your other column's datatypes are
        dtypesdict.update({i: BYTEA})
    # ...

Notes:

  • According to the docs, TIMESTAMP is without timezone by default.
  • Your (byte) columns are usually represented with datatype object in pandas. You should take this into consideration, if you add further data in the future.
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for the tips, I took in your advice and made some progress, but I encounter an error when I try to port arrays in, I've added the error and current code to my original question
It seems like your arrays have a different order than C. So before adding your arrays to your resulting dataframe, try converting them with arr = arr.copy(order='C') with arr being your respective array.
Hi, thanks for the advice! I just tried that, but it still gives the same error. does it have anything to do with me doing a pickle.dump of the array? I need to do it as BYTEA is a datatype for binary data, not exactly arrays and there are no other ways to put numpy arrays into the postgresql database that I am aware of.
Can you show the result with print(value.flags) after the line value = value(order='C') please?
I have added the output for before/after the value=value(order='C") line
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.