2

What's the best way to parse a plain text tree structure like this:

node1:
    node1
    node2:
        node1
node2:
    node1
    node2
    node3:
        node1:
            node1
        node2:
            node1
            node2

and convert it into a tree structure (with lists or dictionaries)?

Is there any python library to help me with the parsing?

0

2 Answers 2

4

The rson library will do this, except you'll probably have to subclass the parser to allow the mixing of array and dict style elements in a single structure.

Edit Actually, that might be a bit difficult, but the rsonlite package will (sort of) work as-is with your data.

rsonlite is a small, single-module package that is only 300 lines long, and the same source works with both Python 2 and Python 3.

Here is an example that shows 3 different outputs from your data. The first output is what rsonlite.dumps() gives, the second output is what the slightly higher-level rsonlite.simpleparse() gives, and the third output takes the results from simpleparse and runs them through a custom fixup() function to create a pure nested dictionary data structure, where any missing value is simply set to None, and all the colon characters are checked and stripped.

from rsonlite import loads, simpleparse


mystring = '''
node1:
    node1
    node2:
        node1
node2:
    node1
    node2
    node3:
        node1:
            node1
        node2:
            node1
            node2
'''

def fixup(node):
    if isinstance(node, str):
        return node
    elif isinstance(node, dict):
        for key in node:
            assert key.endswith(':'), key
        return dict((key[:-1], fixup(value)) for key, value in node.items())
    else:
        assert isinstance(node, (list, str))
        result = {}
        for item in node:
            if isinstance(item, str):
                assert not item.endswith(':')
                assert result.setdefault(item, None) is None
            else:
                for key, value in fixup(item).items():
                    assert result.setdefault(key, value) is value
        return result

print('')
print(loads(mystring))
print('')
print(simpleparse(mystring))
print('')
print(fixup(simpleparse(mystring)))
print('')

Will give:

[('node1:', ['node1', ('node2:', ['node1'])]), ('node2:', ['node1', 'node2', ('node3:', [('node1:', ['node1']), ('node2:', ['node1', 'node2'])])])]

OrderedDict([('node1:', ['node1', OrderedDict([('node2:', 'node1')])]), ('node2:', ['node1', 'node2', OrderedDict([('node3:', OrderedDict([('node1:', 'node1'), ('node2:', ['node1', 'node2'])]))])])])

{'node1': {'node1': None, 'node2': 'node1'}, 'node2': {'node1': None, 'node2': None, 'node3': {'node1': 'node1', 'node2': {'node1': None, 'node2': None}}}}

Sign up to request clarification or add additional context in comments.

Comments

1

You could construct a simple parser that generates a valid python expression from the input and then evaluate it. My initial thought had been a simple recursive parser, but that's more difficult than I anticipated because their is no way to know the block is ending without a peak ahead - a common problem with indent based formats.

This generates nested list of tuples (block_name, [contents]):

i = 0
r = '['
for l in mystring.split('\n'):
    if not l:
        continue
    cl = l.lstrip(' ')
    ci = (len(l) - len(cl))//4
    if ci > i:           # line indented
        r += '['
    elif ci < i:         # line unindented, can be multiple
        r += '])'*(i-ci) + ','
    if cl[-1] == ':':    # new block
        r += '{("{}":'.format(cl[:-1])
    else:                # new item
        r += '"{}",'.format(cl)
    i = ci
r += ']'+')]'*i
eval(r)

Output:

[('node1', ['node1', ('node2', ['node1'])]),
 ('node2',
  ['node1',
   'node2',
   ('node3', [('node1', ['node1']), ('node2', ['node1', 'node2'])])])]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.