1

I'm trying to get a good solution for a (de-)serializer. I've got a format pattern and all the values to put in.

The format pattern is as follows:

msg = '$bla,%d,%02d,%02d %02d:%02d:%02d.%03d' % (kwargs['...'], ...)

When I serialize the values, I get the following string:

bla,1990,12,24 13:37:11.001

But I also have to deserialize it. The pattern can strongly vary in length and types. I'd like to deserialize the string based only on the format pattern.

Any ideas how this is achievable?

EDIT: I'm using Python 2.7.6

2 Answers 2

2

If you can fully control the protocol, or saying format after serialization, I suggest using some existing solution, e.g Pickle provided by Python standard library, Json which is very popular in web, or Protobuf which is cross-language provided by Google

Pickle:

>>> import pickle
>>> formattuple = (1990,12,24,13,37,11,1) 
>>> s = pickle.dumps(formattuple)
>>> s
'(I1990\nI12\nI24\nI13\nI37\nI11\nI1\ntp0\n.'
>>> pickle.loads(s)
(1990, 12, 24, 13, 37, 11, 1)

Json:

>>> import json
>>> formattuple = (1990,12,24,13,37,11,1)
>>> s = json.dumps(formattuple)
>>> s
'[1990, 12, 24, 13, 37, 11, 1]'
>>> json.loads(s)
[1990, 12, 24, 13, 37, 11, 1]

Be aware that json has some limitation, such as a bit more difficult to serialize and de-serialize objects other than dict, list and tuple. And would NOT de-serialize to a totally identical format, because some data structure such as tuple is not existed in json

Protobuf is a more powerful but more complex solution. You need to define the data schema first.

Sign up to request clarification or add additional context in comments.

1 Comment

Unfortunately I don't have control over the protocol. The incoming string always looks similar to the one I posted.
1

You might be able to do this with regex, with some assumptions. Here's a partial example, you'd probably need to add to it for a full solution. Basically, we convert each print format, to a regex that matches it, by pieces.

import re
formattuple = (1990,12,24,13,37,11,1) 
formatstr = 'bla,%d,%02d,%02d %02d:%02d:%02d.%03d'

def rep_format(fmt):
    fmt = fmt.group(0)
    if fmt[0] != '%':
        return fmt
    if fmt == '%d':
        return r'(\d+)'
    ftype = fmt[-1]
    if ftype == 'd':
       fwidth = int(fmt[1:-1])
       return r'(\d{%d})'%(fwidth)
    else:
       return fmt

scanstr = re.sub(r'%\d+[df]', rep_format, formatstr)
scanstr
'bla,(\\d+),(\\d{2}),(\\d{2}) (\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})'

fstr = formatstr%formattuple
fstr
'bla,1990,12,24 13:37:11.001'

match = re.match(scanstr, fstr)
match.groups()
('1990',
 '12',
 '24',
 '13',
 '37',
 '11',
 '001')
mtuple = tuple(int(x) for x in match.groups())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.