0

I have a pre-existing berkeley database written to and read from a program written in C++. I need to sidestep using this program and write to the database directly using python.

I can do this, but am having a heck of a time trying to encode my data properly such that it is in the proper form and can then be read by the original C++ program. In fact, I can't figure out how to decode the existing data when I know what the values are.

The keys of the key value pairs in the database should be timestamps in the form YYYYMMDDHHmmSS. The values should be five doubles and an int mashed together, by which I mean (from the source code of the C++ program), the following structure(?) DVALS

typedef struct
{
  double d1;
  double d2;
  double d3;
  double d4;
  double d5;
  int i1;
} DVALS;

is written to the database as the value of the key value pair like so:

DBT data;
memset(&data, 0, sizeof(DBT));

DVALS dval;
memset(&dval, 0, sizeof(DVALS));
data.data = &dval;
data.size = sizeof(DVALS);

db->put(db, NULL, &key, &data, 0);

Luckily, I can know what the values are. So if I run from the command line

db_dump myfile

the final record is:

323031393033313431353533303000
ae47e17a140e4040ae47e17a140e4040ae47e17a140e4040ae47e17a140e40400000000000b6a4400000000000000000

Using python's bsddb3 module I can pull this record out also:

from bsddb3 import db
myDB = db.DB()
myDB.open('myfile', None, db.DB_BTREE)
cur = myDB.cursor()
kvpair = cur.last()

With kvpair now holding the following information:

(b'20190314155300\x00', b'\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\x00\x00\x00\x00\x00\xb6\xa4@\x00\x00\x00\x00\x00\x00\x00\x00')

The timestamp is easy to read and in this case the actual values are as follows:

d1 = d2 = d3 = d4 = 32.11
d5 = 2651
i1 = 0

As the '\xaeG\xe1z\x14\x0e@@' sequence is repeated 4 times I think it corresponds to the value 32.11

So I think my question may just be about encoding/decoding, but perhaps there is more to it, hence the backstory.

kvpair[1].decode('utf-8')

Using a variety of encodings just gives errors similar to this:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 0: invalid start byte
1
  • 1
    you should keep your questions precise and leave out back stories Commented Mar 16, 2019 at 22:39

1 Answer 1

2

The value data is binary so it may be unpacked using Python's struct module.

>>> import struct
>>> bs = b'\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\xaeG\xe1z\x14\x0e@@\x00\x00\x00\x00\x00\xb6\xa4@\x00\x00\x00\x00\x00\x00\x00\x00'
>>> len(bs)
48
>>> struct.unpack('<5di4x', bs)
(32.11, 32.11, 32.11, 32.11, 2651.0, 0)

struct.unpack takes two arguments: a format specifier that defines the data format and types and the data to be unpacked. The format '<5di4x' describes:

  • <: little endian order
  • 5d: five doubles (8 bytes each)
  • i: one signed int (4 bytes; I for unsigned)
  • 4x: four pad bytes

Data can be packed in the same way, using struct.pack.

>>> nums = [32.11, 32.11, 32.11, 32.11, 2651, 0]
>>> format_ = '5di4x'
>>> packed = struct.pack(format_, *nums)
>>> packed == bs
True
>>> 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.