1

This is going to be a duplicate of store numpy array in mysql . The reason is that the actual question is not answered there, and 5 years later I (another random person) am struggling with the same problem.

Goal

I have a lot of numpy vectors. I want to store each of them in a BLOB column in a MySQL database. I have already read that this is likely not a good idea, but I'd like to do it anyway. For this, I am trying to create a query. The code looks something like:

query = 'INSERT INTO vectors_table (word_id, vector) VALUES '

for i in vectors:
    query += ({}, {}).format(i.index, i.vector.dumps())

(notice the call to .dumps())

Alternatives

Numpy arrays have a few options of functions to "serialize" its data:

I could also use Pickle. There is a function called pickle.dumps that also gives me a bytes sequence.

Problem

When I try creating the query, it inserts that goddamn b before the string. E.g., if I print one of my vectors, it looks like:

b'\x80\x02cnumpy.core.multiarray\n_reconstruct\nq\x00cnumpy\nndarray\nq\x01K\x00\x85q\x02c_codecs\nencode\nq\x03X\x01\x00\x00\x00bq\x04X\x06\x00\x00\x00latin1q\x05\x86q\x06Rq\x07\x87q\x08Rq\t(K\x01M\x00\x0c\x85q\ncnumpy\ndtype\nq\x0bX\x02\x00\x00\x00f4q\x0cK\x00K\x01\x87q\rRq\x0e(K\x03X\x01\x00\ ...

Unfortunately, as can be seen in the "Alternatives" above, this is true for all alternatives I have found so far.

Attempted solutions

In trying to solve the problem, I find lots of people telling me to use decode, or to cast the bytes into a string.

Calling either .decode('ascii') or .decode('utf-8') has led to UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128) and UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte, respectively.

Trying to convert the bytes into a string creates a string that includes the "b".

Is there any better way of doing this?

I am noob with MySQL and totally believe that I am missing some obvious solution. I am getting to the point where I'm planning to just create a string with

for i in vectors:
    old_str = str(i.vector.dumps())
    new_str = old_str[1:]
    query += ({}, {}).format(i.index, new_str)

And just insert that into my query string. Stil... I can't believe that this is the only way to get this to work.

1 Answer 1

1

In the end, the answer to my question was in this other question. I'll answer it here in case someone else has a similar problem.

I was doing

query = 'INSERT INTO vectors_table (word_id, vector) VALUES '

for i in vectors:
    query += "({}, {})".format(i.index, i.vector.dumps())

and then running cursor.execute(query).

Instead, what I should do was to write my query with %s:

query = 'INSERT INTO vectors_table (word_id, vector) VALUES (%s, %s)'

then get all the elements I want to insert into the table:

tuples = [(i.index, i.vector.dumps()) for i in vectors]

and let the MySQL library handle the escaping of the bad bytes:

cursor.executemany(query, tuples)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.