4

I need to parse text files similar to the one below with Python, build an hierarchical object structure of the data and then process it. This is very similar to what we can do with xml.etree.ElementTree and other XML parsers.

The syntax of these files is however not XML and I'm wondering what is the best way to implement such a parser: if trying to subclass one XML parser (which one?) and customize its behavior for tag recognition, write a custom parser, etc.

{NETLIST topblock
{VERSION 2 0 0}

{CELL topblock
    {PORT gearshift_h vpsf vphreg pwron_h vinp vref_out vcntrl_out gd meas_vref 
      vb vout meas_vcntrl reset_h vinm }
    {INST XI21/Mdummy1=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC gs_h=DRN vpsf=GATE vpsf=BULK }}
    {INST XI21/Mdummy2=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN gs_h=SRC gd=DRN gd=GATE gd=BULK }}
    {INST XI20/Mdummy1=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC gs_hn=DRN vpsf=GATE vpsf=BULK }}
    {INST XI20/Mdummy2=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN gs_hn=SRC gd=DRN gd=GATE gd=BULK }}
    {INST XI19/Mdummy1=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC net514=DRN vpsf=GATE vpsf=BULK }}
    {INST XI19/Mdummy2=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN net514=SRC gd=DRN gd=GATE gd=BULK }}
    {INST XI21/MN0=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN gd=SRC gs_h=DRN gs_hn=GATE gd=BULK }}
    {INST XI21/MP0=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC gs_h=DRN gs_hn=GATE vpsf=BULK }}
    {INST XI20/MN0=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
...
}
}

2 Answers 2

4

What the others said in the comments: use an existing parser. If none exists, roll your own, but use a parser library. Here e.g. with Parcon:

from pprint import pprint
from parcon import (Forward, SignificantLiteral, Word, alphanum_chars, Exact,
                    ZeroOrMore, CharNotIn, concat, OneOrMore)

block = Forward()
hyphen = SignificantLiteral('"')
word = Word(alphanum_chars + '/_.)')
value = word | Exact(hyphen + ZeroOrMore(CharNotIn('"')) + hyphen)[concat]
pair = word + '=' + value
flag = word
attribute = pair | flag | block
head = word
body = ZeroOrMore(attribute)
block << '{' + head + body  + '}'
blocks = OneOrMore(block)

with open('<your file name>.txt') as infile:
    pprint(blocks.parse_string(infile.read()))

Result:

[('NETLIST',
  ['topblock',
   ('VERSION', ['2', '0', '0']),
   ('CELL',
    ['topblock',
     ('PORT',
      ['gearshift_h',
       'vpsf',
       'vphreg',
       'pwron_h',
       'vinp',
       'vref_out',
       'vcntrl_out',
       'gd',
       'meas_vref',
       'vb',
       'vout',
       'meas_vcntrl',
       'reset_h',
       'vinm']),
     ('INST',
      [('XI21/Mdummy1', 'pch_18_mac'),
       ('TYPE', ['MOS']),
       ('PROP',
        [('n', '"sctg_inv1x/pch_18_mac"'),
         ('Length', '0.152'),
         ('NFIN', '8')]),
       ('PIN',
        [('vpsf', 'SRC'),
         ('gs_h', 'DRN'),
         ('vpsf', 'GATE'),
         ('vpsf', 'BULK')])]),
     ('INST',
        ...
Sign up to request clarification or add additional context in comments.

2 Comments

This is very nice and hands-on. Thanks a lot.
Very nice, very nice indeed!
4

First of all, you should check if there is already a parser available for your file format. Apparently there is: Python-based Verilog Parser (currently Netlist only)

If you can't find anything suitable, you can build a parser using one of plethora available libraries for building parsers, for example pyparsing. Subclassing XML parsers doesn't seem to be a good idea.

5 Comments

This text file is not a Verilog netlist. It is one of the netlist outputs of IC Validator EDA tool and it seems to me a custom netlist format, at least this is not the usual SPICE, LEF or EDIF.
Perhaps this is an extension of the Verilog's format. In this case it may be similar enough to Verilog's format to make extending of the Python's lib easy enough. Otherwise, you can try to build your own parser using pyparsing or something else. If the format is simple enough then string.split and regular expressions may be all you need...
This is not an extension to Verilog, Verilog is completely different. pyparsing however seems promising. Thanks
As I said, there is plenty of Python libraries for parsing, see here wiki.python.org/moin/LanguageParsing . You may find something you will like better than pyparsing, but it should be OK.
But really, make sure you don't break open doors. Perhaps there is an open source software for parsing this format. Or maybe the tool has an option to produce an XML instead of this. Anyway, subclassing an XML parser to parse a format different than XML is not a good idea.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.