Parsing a file with hierarchical structure in Python

Question

I'm trying to parse the output from a tool into a data structure but I'm having some difficulty getting things right. The file looks like this:

 Fruits
   Apple
     Auxiliary
     Core
     Extras
   Banana
     Something
   Coconut
 Vegetables
   Eggplant
   Rutabaga

You can see that top-level items are indented by one space, and items beneath that are indented by two spaces for each level. The items are also in alphabetical order.

How do I turn the file into a Python list that's something like ["Fruits", "Fruits/Apple", "Fruits/Banana", ..., "Vegetables", "Vegetables/Eggplant", "Vegetables/Rutabaga"]?

John La Rooy · Accepted Answer · 2010-03-31 16:21:25Z

4

>>> with open("food.txt") as f:
...     res = []
...     s=[]
...     for line in f:
...         line=line.rstrip()
...         x=len(line)
...         line=line.lstrip()
...         indent = x-len(line)
...         s=s[:indent/2]+[line]
...         res.append("/".join(s))
...     print res
... 
['Fruits', 'Fruits/Apple', 'Fruits/Apple/Auxiliary', 'Fruits/Apple/Core', 'Fruits/Apple/Extras', 'Fruits/Banana', 'Fruits/Banana/Something', 'Fruits/Coconut', 'Vegetables', 'Vegetables/Eggplant', 'Vegetables/Rutabaga']

answered Mar 31, 2010 at 16:21

John La Rooy

306k54 gold badges378 silver badges513 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ghostdog74 · Accepted Answer · 2010-03-31 16:12:31Z

1

so you don't want the deepest level right? I don't know if i get you correct, but nevertheless, here's one approach

d=[]
for line in open("file"):
    if not line.startswith("    "):
         if line.startswith("  "):
             d.append(p+"/"+line.strip())
         elif line.startswith(" "):
             p=line.rstrip()

output

$ ./python.py
[' Fruits/Apple', ' Fruits/Banana', ' Fruits/Coconut', ' Vegetables/Eggplant', ' Vegetables/Rutabaga']

answered Mar 31, 2010 at 16:12

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

1 Comment

Kevin Stargel Over a year ago

Hmm, sorry. I want all the levels, and it could be of arbitrary depth.

snies · Accepted Answer · 2010-03-31 16:24:37Z

0

This assumes your input file is 'datafile.txt', you only use whitespaces to indent, you specify your indent_string per level and your level 0 starts without any indent (no whitespace on lowest indent at all). All this contraints can be removed with little effort. But basic layout should be clear:

import re

indent_string = '  '
pattern = re.compile('(?P<blanks>\s*)(?P<name>.*)')


f = open('datafile.txt')

cache={}

for line in f:
  m = pattern.match(line)
  d = m.groupdict()
  level = len(d['blanks']) / len(indent_string)
  cache.update({level: d['name']})
  s = ''
  for i in xrange(level+1):
    s += '/' + cache[i]
  print s

edited Mar 31, 2010 at 16:24

answered Mar 31, 2010 at 16:19

snies

3,5411 gold badge24 silver badges19 bronze badges

Comments

ChristopheD · Accepted Answer · 2010-03-31 17:29:22Z

0

You could do something like this:

builder, outlist = [], []
current_spacing = 0

with open('input.txt') as f:
    for line in f:
        stripped = line.lstrip()
        num_spaces = len(line) - len(stripped)
        if num_spaces == current_spacing:
            builder.pop()
        elif num_spaces < current_spacing:
            for i in xrange(current_spacing - num_spaces):
                builder.pop()
        builder.append(stripped)
        current_spacing = num_spaces
        outlist.append("/".join(builder))

print outlist

edited Mar 31, 2010 at 17:29

answered Mar 31, 2010 at 16:25

ChristopheD

117k30 gold badges167 silver badges182 bronze badges

Collectives™ on Stack Overflow

Parsing a file with hierarchical structure in Python

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related