python string tokenization - custom lexer?

Question

I have a string like:

<number>xx<->a<T>b<F>c<F>d<F>e<F>f<F>g<T>h<F>i<F>

How can I efficiently parse this string so that i.e.

xx has a value of null
a has a value of 1
b has a value of 0

From a SQL RDBMS where this is part of a comment field ... :( — Georg Heiler
– Georg Heiler, Commented Nov 21, 2017 at 15:44
Parsing means analysing. You seem to want to change/substitute values. Is that correct? Is <number> meant the be exactly that string or is number meant to be a number? — mrCarnivore
– mrCarnivore, Commented Nov 21, 2017 at 15:49
Meant to be a 12345:1.23455 type of number. Indeed, Analyze might be a better way to formulate it. In the and (as there are multiple of these records for a single database row) I want to aggregate over the array and sum i.e. all as. — Georg Heiler
– Georg Heiler, Commented Nov 21, 2017 at 15:51
Can you give an example of the output you are expecting? And also how you calculated it. Even reading the other comments I think its a bit unclear what you want. — noslenkwah
– noslenkwah, Commented Nov 21, 2017 at 15:59

PM 2Ring · Accepted Answer · 2017-11-21 16:07:03Z

1

You can parse that with Regular Expressions. We first remove the initial <word> at the start of the string, if it exists, and then look for pairs of word<word>, saving them into key,value pairs in a dictionary using the codes dictionary to convert _, F, T, to null, 0, 1.

import re

s = '<number>xx<->a<T>b<F>c<F>d<F>e<F>f<F>g<T>h<F>i<F>'

m = re.match(r'<(\w*?)>', s)
if m:
    head = m.group(1)
    s = s[m.end():]
    print(head)
else:
    print('No head group')

codes = {'-': 'null', 'F': '0', 'T': '1'}
pat = re.compile(r'(\w*?)<([-\w]*?)>')

out = {k: codes[v] for k, v in pat.findall(s)}
print(out)

output

number
{'xx': 'null', 'a': '1', 'b': '0', 'c': '0', 'd': '0', 'e': '0', 'f': '0', 'g': '1', 'h': '0', 'i': '0'}

answered Nov 21, 2017 at 16:07

PM 2Ring

55.6k6 gold badges96 silver badges201 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

python string tokenization - custom lexer?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related