1

I'm trying to parse the output from a tool into a data structure but I'm having some difficulty getting things right. The file looks like this:

 Fruits
   Apple
     Auxiliary
     Core
     Extras
   Banana
     Something
   Coconut
 Vegetables
   Eggplant
   Rutabaga

You can see that top-level items are indented by one space, and items beneath that are indented by two spaces for each level. The items are also in alphabetical order.

How do I turn the file into a Python list that's something like ["Fruits", "Fruits/Apple", "Fruits/Banana", ..., "Vegetables", "Vegetables/Eggplant", "Vegetables/Rutabaga"]?

4 Answers 4

4
>>> with open("food.txt") as f:
...     res = []
...     s=[]
...     for line in f:
...         line=line.rstrip()
...         x=len(line)
...         line=line.lstrip()
...         indent = x-len(line)
...         s=s[:indent/2]+[line]
...         res.append("/".join(s))
...     print res
... 
['Fruits', 'Fruits/Apple', 'Fruits/Apple/Auxiliary', 'Fruits/Apple/Core', 'Fruits/Apple/Extras', 'Fruits/Banana', 'Fruits/Banana/Something', 'Fruits/Coconut', 'Vegetables', 'Vegetables/Eggplant', 'Vegetables/Rutabaga']
Sign up to request clarification or add additional context in comments.

Comments

1

so you don't want the deepest level right? I don't know if i get you correct, but nevertheless, here's one approach

d=[]
for line in open("file"):
    if not line.startswith("    "):
         if line.startswith("  "):
             d.append(p+"/"+line.strip())
         elif line.startswith(" "):
             p=line.rstrip()

output

$ ./python.py
[' Fruits/Apple', ' Fruits/Banana', ' Fruits/Coconut', ' Vegetables/Eggplant', ' Vegetables/Rutabaga']

1 Comment

Hmm, sorry. I want all the levels, and it could be of arbitrary depth.
0

This assumes your input file is 'datafile.txt', you only use whitespaces to indent, you specify your indent_string per level and your level 0 starts without any indent (no whitespace on lowest indent at all). All this contraints can be removed with little effort. But basic layout should be clear:

import re

indent_string = '  '
pattern = re.compile('(?P<blanks>\s*)(?P<name>.*)')


f = open('datafile.txt')

cache={}

for line in f:
  m = pattern.match(line)
  d = m.groupdict()
  level = len(d['blanks']) / len(indent_string)
  cache.update({level: d['name']})
  s = ''
  for i in xrange(level+1):
    s += '/' + cache[i]
  print s

Comments

0

You could do something like this:

builder, outlist = [], []
current_spacing = 0

with open('input.txt') as f:
    for line in f:
        stripped = line.lstrip()
        num_spaces = len(line) - len(stripped)
        if num_spaces == current_spacing:
            builder.pop()
        elif num_spaces < current_spacing:
            for i in xrange(current_spacing - num_spaces):
                builder.pop()
        builder.append(stripped)
        current_spacing = num_spaces
        outlist.append("/".join(builder))

print outlist

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.