0

Need to parse output of a command in python. The command returns something like this

A:
        2 bs found
        3 cs found
B:
        1 a found
        3 bs found
C:
        1 c found
        D:
                2 es found
                3 fs found

Need to able to do the following with the output:

access a.bs found b.a found. c.d.es found and so on.

How do I do this python? What data structure is best suited to do this?

The goal of this exercise is to run the command every 10 secs and identify a diff of what's changed

7
  • How do i do a code block inside the question? The output format of my command is all mangled. Commented May 17, 2013 at 1:03
  • You can select the code block can press ctrl+k. Commented May 17, 2013 at 1:03
  • 1
    To clarify - are the {{{ and }}} actually part of your output, or was that an attempt to make it a code block? Commented May 17, 2013 at 1:04
  • {{{ }}} is not part of output. Was to make the output a code block. Commented May 17, 2013 at 1:05
  • 1
    Nested dictionaries are an appropriate structure, IMO. You'd want to end up with something like this, in Python terms: {'A':{'b':2,'c':3}, 'B':{'a':1,'b':3}, 'C':{'c':1, D: {'e': 2,'f':3}}} Commented May 17, 2013 at 1:07

2 Answers 2

2

An alternative solution is to translate the input string directly into something that a pre-existing library can read. This particular data looks like a good fit for YAML.

In this case you would re.sub('( +)([1-9]+) ([a-z]).+', '\\1\\3 : \\2', allcontent), which rewrites the '2 cs found' type lines into a key:value mapping that pyYAML understands. To be precise, the form '2 cs found' becomes 'c : 2'

the result?

A:
        b : 2
        c : 3
B:
        a : 1
        b : 3
C:
        c : 1
        D:
                e : 2
                f : 3

executing yaml.load(newcontent) returns the following python data structure:

{'A': {'b': 2, 'c': 3},
 'B': {'a': 1, 'b': 3},
 'C': {'D': {'e': 2, 'f': 3}, 'c': 1}}

Which matches my suggestion in my earlier comment. If you prefer json (Python comes with a json module), it's pretty simple to adapt this strategy to produce JSON instead.

Sign up to request clarification or add additional context in comments.

Comments

0

This should have a 'parsing' tag as it's a general parsing problem.

The normal solution in this kind of situation is to track a) the indentation and b) the list of structures that are currently being parsed, as you read in lines. b would begin as a list containing a single empty dict, ie. curparsing = [{}]

Loop over all input lines. For example:

with open('inputfilename','r') as f:
    for line in f:
        # code implementing the below rules.
  • if a line is blank (if not line.strip():), ignore it and go onto the next one (continue)

  • if the indentation level has decreased, we should remove the top item in the currently-parsing list (ie. curparsing.pop()). if multiple decreases are detected, we should remove multiple items from the top.

  • strip off any leading indentation with line=line.lstrip()

  • if ':' is in the line, then we've found a sub-dictionary. Read the key(the part to the left of ':'), increase the indent-level, create a new dictionary, and insert it into the dictionary at the current top of the list. Then append our newly-created dictionary to the list.

  • if line[0] in '123456789': then we found a report of '[count] [character]s found'. we can use regular expressions to find the count and the character, with m = re.match('([1-9]+) ([a-z])'); count, character = m.groups(); count = int(count). We then store this into the dictionary at the current top of the list: curparsing[-1][character] = count

That's pretty much it. You just loop over lines and apply these rules to each line, and at the end, curparsing[0] contains the parsed document.

1 Comment

Thanks for the pointer. Will try and let you know how it goes

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.