1

I'm trying to insert Numpy array into PostgreSQL. Tried to do like this

def write_to_db(some_arr, some_txt):
""" insert a new array into the face_ar table """
    sql = """INSERT INTO test_db VALUES(%s,%s);"""
    conn = None
    try:
        params = config()
        conn = psycopg2.connect(**params)
        cur = conn.cursor()
        cur.execute(sql, (some_arr, some_txt))
        conn.commit()
        cur.close()

    except (Exception, psycopg2.DatabaseError) as e:
        print(e)
    finally:
        if conn is not None:
            conn.close()

Before it i created a table in my DB

create table test_db (encodings double precision[], link text);

Finally i got an error: "can't adapt type 'numpy.ndarray'"

I need to write Numpy array of 125 float64 items and small text like a link in each row. There will be a few millions of rows in my project. Just speed of reading and size of DB are important. As i got it is not possible to insert Numpy array directly, and need to convert it to another format. First idea i got was to convert it to Binary data and save it to DB, but i dont know how to do it and how to get it back from DB in Numpy array format.

2 Answers 2

2

Thanks to Vasyl Kushnir. This method started to work well and fast for reading data

import psycopg2
from config import config
import msgpack
import msgpack_numpy as m

def write_to_db(encoding, link):
""" insert a new array into the test1_db table """
    sql = """INSERT INTO test1_db VALUES(%s,%s);"""
    conn = None
    dumped_data = msgpack.packb(encoding, default=m.encode)
    try:
        params = config()
        conn = psycopg2.connect(**params)
        cur = conn.cursor()
        cur.execute(sql, (dumped_data, link))
        conn.commit()
        cur.close()

    except (Exception, psycopg2.DatabaseError) as e:
        print(e)
    finally:
        if conn is not None:
            conn.close()

def read_from_db():
""" query data from the test1_db table """
    conn = None
    row = None
    try:
        params = config()
        conn = psycopg2.connect(**params)
        cur = conn.cursor()
        cur.execute("SELECT encodings, link FROM test1_db")
        print("The number of rows: ", cur.rowcount)
        row = cur.fetchone()
        cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()
        encoding1, somelink = row
        return msgpack.unpackb(encoding1, object_hook=m.decode), somelink
Sign up to request clarification or add additional context in comments.

Comments

1

Try to use pickle python for binary serialization/deserialization

Example:

import numpy as np
from pickle import dumps, loads
data=np.array([1,2,4,5,6])
dumped_data = dumps(data)
loaded_data = loads(dumped_data)
print(dumped_data)
print(loaded_data)

5 Comments

It is strange but pickle works much faster than np.save and np.loads.
I will try to make code faster with using msgpack-0.6.0 . Test showed that it decodes 2X faster than pickle, but it is not adapted for np arrays.
np.array has tolist method if you want to use msgpack-0.6.0 but it the same speed as pickle on with np.array on my computer.
% timeit dumps(encoding) 11.5 µs ± 489 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) % timeit loads(out) 5 µs ± 37.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) % timeit msgpack.packb(encoding, default=m.encode) 19.8 µs ± 581 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) % timeit msgpack.unpackb(x_enc, object_hook=m.decode) 3.62 µs ± 43.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
with pickle.loading - 5µs ; with msgpack_numpy.unpack - 3.62 µs

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.