0

I need to write a python script (I'm a newbie in python but would like to take this a practice) to parse a message of the following format:

T:L:x1:x2:x3:...T1:L1:y1:y2:y3...Tn:Ln:z1:z2:z3:...

where T holds a type, L is the length and x1..xn is the actual data of the type T1-Tn. Each character is separated with : symbol, all values always come in HEX representation.

For example:

1:4:a:5:6:7:2:10:72:75:63:6f:6e:74:72:6f:6c:6c:65:72:2e:6f:72:67

(Type1=1, Length1=4, Type2=2, Length2=16 (10 in hex))

The parsed messages should be stored in dictionary (I think this is the most appropriate data structure, but I'd be glad to hear some other suggestions).

So I am probably going to split the text, extract type and length, walk further and extract L bytes and store them in a dict with T as a key.

  1. So I will run a loop, how do I determine the end of string, so that I can break out of the loop?
  2. The actual data (x1-x3... for example) has to be stored in dictionary with : removed. I'm not sure how to do that.

I'd appreciate to learn about more efficient approach of parsing the string. Thanks!

7
  • please put an example of exact syntax you would like to parse, not a template Commented Oct 23, 2021 at 21:09
  • Have you tried using the split string method? Commented Oct 23, 2021 at 21:24
  • If you control the data source, I would highly recommend changing the format. This will be error prone Commented Oct 23, 2021 at 21:59
  • 1
    How I haven't thought about it before... I updated my answer. Can you check it, please? Commented Oct 24, 2021 at 6:39
  • 1
    @Corralien, yes it works as expected. Thank you. Commented Oct 25, 2021 at 14:24

2 Answers 2

2

Something like this should work:

ss = "1:4:a:5:6:7:2:10:72:75:63:6f:6e:74:72:6f:6c:6c:65:72:2e:6f:72:67".split(":")

d = {}
idx = 0
while idx < len(ss):
    key = ss[idx]
    idx += 1
    length = int(ss[idx])
    idx += 1
    arr = ss[idx:idx+length]
    d[key] = arr
    idx += length

output d:

{'1': ['a', '5', '6', '7'],
 '2': ['72', '75', '63', '6f', '6e', '74', '72', '6f', '6c', '6c'],
 '65': ['2e', '6f', '72', '67']}
Sign up to request clarification or add additional context in comments.

2 Comments

I don't think this is the expect outcome.
@Corralien this follows the parsing rules type:length:<values> as outlined in the question. It looks like there may be a typo in the sample data
1

Create an iterator over your string:

message = '1:4:a:5:6:7:2:10:72:75:63:6f:6e:74:72:6f:6c:6c:65:72:2e:6f:72:67'

code = iter(message.split(':'))
data = {}

for t in code:
    l = int(next(code), 16)
    d = [next(code) for _ in range(l)]
    data[t] = d

Output:

>>> data
{'1': ['a', '5', '6', '7'],
 '2': ['72', '75', '63', '6f', '6e', '74', '72', '6f', '6c', '6c', '65', '72', '2e', '6f', '72', '67']}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.