Deserialize string into values using format pattern

Question

I'm trying to get a good solution for a (de-)serializer. I've got a format pattern and all the values to put in.

The format pattern is as follows:

msg = '$bla,%d,%02d,%02d %02d:%02d:%02d.%03d' % (kwargs['...'], ...)

When I serialize the values, I get the following string:

bla,1990,12,24 13:37:11.001

But I also have to deserialize it. The pattern can strongly vary in length and types. I'd like to deserialize the string based only on the format pattern.

Any ideas how this is achievable?

EDIT: I'm using Python 2.7.6

shuiyu · Accepted Answer · 2014-03-05 15:08:09Z

2

If you can fully control the protocol, or saying format after serialization, I suggest using some existing solution, e.g Pickle provided by Python standard library, Json which is very popular in web, or Protobuf which is cross-language provided by Google

Pickle:

>>> import pickle
>>> formattuple = (1990,12,24,13,37,11,1) 
>>> s = pickle.dumps(formattuple)
>>> s
'(I1990\nI12\nI24\nI13\nI37\nI11\nI1\ntp0\n.'
>>> pickle.loads(s)
(1990, 12, 24, 13, 37, 11, 1)

Json:

>>> import json
>>> formattuple = (1990,12,24,13,37,11,1)
>>> s = json.dumps(formattuple)
>>> s
'[1990, 12, 24, 13, 37, 11, 1]'
>>> json.loads(s)
[1990, 12, 24, 13, 37, 11, 1]

Be aware that json has some limitation, such as a bit more difficult to serialize and de-serialize objects other than dict, list and tuple. And would NOT de-serialize to a totally identical format, because some data structure such as tuple is not existed in json

Protobuf is a more powerful but more complex solution. You need to define the data schema first.

edited Mar 5, 2014 at 15:08

answered Mar 5, 2014 at 14:57

shuiyu

1814 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Josch Over a year ago

Unfortunately I don't have control over the protocol. The incoming string always looks similar to the one I posted.

Corley Brigman · Accepted Answer · 2014-03-05 14:35:53Z

You might be able to do this with regex, with some assumptions. Here's a partial example, you'd probably need to add to it for a full solution. Basically, we convert each print format, to a regex that matches it, by pieces.

import re
formattuple = (1990,12,24,13,37,11,1) 
formatstr = 'bla,%d,%02d,%02d %02d:%02d:%02d.%03d'

def rep_format(fmt):
    fmt = fmt.group(0)
    if fmt[0] != '%':
        return fmt
    if fmt == '%d':
        return r'(\d+)'
    ftype = fmt[-1]
    if ftype == 'd':
       fwidth = int(fmt[1:-1])
       return r'(\d{%d})'%(fwidth)
    else:
       return fmt

scanstr = re.sub(r'%\d+[df]', rep_format, formatstr)
scanstr
'bla,(\\d+),(\\d{2}),(\\d{2}) (\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})'

fstr = formatstr%formattuple
fstr
'bla,1990,12,24 13:37:11.001'

match = re.match(scanstr, fstr)
match.groups()
('1990',
 '12',
 '24',
 '13',
 '37',
 '11',
 '001')
mtuple = tuple(int(x) for x in match.groups())

Collectives™ on Stack Overflow

Deserialize string into values using format pattern

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related