How to write a simple lexer/parser for user input commands in Python?

Question

For a personal project, i need to write a simple lexer in order to treat user input commands. They can contain four patterns and brackets :

operators : ., +, - or %
names ([a-zA-Z0-9]) : for instance foobar, fooZ or bar2
filters ([a-z]{1}"[some text]") : for instance a"hello world", p"escaped \"string\""
other filters ([a-z]{1}~"[some text]") : for instance a~"hello world", p~"escaped \"string\""

I would like to split the command in order to have a hierarchical list (based on the brackets) that contains an entry for each operator, an entry for each name (a single string), and an entry for each filter (a dictionary, for instance {'a': 'hello world'} or {'p~': 'escaped "string"'}).

So here's an example of what i want to get with this input (spaces around operators are optional):

foo . bar - (a"hello \"world\"" % (b~"foo" + bar) . (fooZ.(bar2+(c"foo bar".d"bar foo"))))

[
  'foo',
  '.',
  'bar',
  '-',
  [
    {'a': 'hello "world"'},
    '%',
    [
      {'b~': 'foo'},
      '+',
      'bar'
    ]
    '.',
    [
      'fooZ',
      '.',
      [
        'bar2',
        '+',
        [
          {'c': 'foo bar'},
          '.',
          {'d': 'bar foo'}
        ]
      ]
    ]
  ]
]

I had a look at pyparsing, PLY, pyPEG, Yapps, etc. but they all seem hard to use and i don't know if i need this kind of tool since the grammar i need is not very complex. Thanks in advance for your suggestions!

It's better to use one of the tools for parsing, no matter how simple the grammar looks :) This way, you can easily expand to future requirements, if needed. I'd recommend Python PLY :) — Mihai Maruseac
– Mihai Maruseac, Commented Jul 9, 2012 at 12:03
Is this a personal project more along the lines of learning how to parse? Or, is it just about parsing the input then doing stuff with it? Depending on which one it is will vary on what you will want to use. The first is more fun to be honest, although yes you are reinventing a very beaten horse, but helps with personal understanding. — sean
– sean, Commented Jul 9, 2012 at 13:00
sean > it's the second, so i think the best way is to use a tool as Mihai said, using regex seems boring and i don't have a lot of time — Nicolas
– Nicolas, Commented Jul 9, 2012 at 13:14
pyparsing is dead easy to use. It would be my first port of call for a simple grammar like yours. — Chinmay Kanchi
– Chinmay Kanchi, Commented Jul 9, 2012 at 13:49

PaulMcG · Accepted Answer · 2012-07-09 13:35:27Z

3

See how this pyparsing solution addresses your stated grammar:

from pyparsing import *
ParserElement.enablePackrat()
import string

name = Word(alphanums)

filter = (Combine(oneOf(list(string.ascii_lowercase)) + Optional('~')) +
          dblQuotedString.setParseAction(removeQuotes))
# convert parsed filter to dict
filter.setParseAction(lambda t:{t[0]:t[1]})

expr = operatorPrecedence(filter | name, 
            [
            (oneOf('. % -'), 2, opAssoc.LEFT),
            ('+', 2, opAssoc.LEFT),
            ])

test = r'foo . bar - (a"hello \"world\"" % (b~"foo" + bar) . (fooZ.(bar2+(c"foo bar".d"bar foo"))))'

print expr.parseString(test, parseAll=True).asList()

Prints:

[['foo',
  '.',
  'bar',
  '-',
  [{'a': 'hello \\"world\\"'},
   '%',
   [{'b~': 'foo'}, '+', 'bar'],
   '.',
   ['fooZ', '.', ['bar2', '+', [{'c': 'foo bar'}, '.', {'d': 'bar foo'}]]]]]]

edited Jul 9, 2012 at 13:35

answered Jul 9, 2012 at 13:30

PaulMcG

64.1k16 gold badges98 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

PaulMcG Over a year ago

Now I'm not so sure there is any precedence of one operator over another, the grouping in your example may just be due to parenthetical nesting. If that is the case, you can merge the two levels in operatorPrecedence down to just (oneOf('. % + -'), 2, opAssoc.LEFT).

Collectives™ on Stack Overflow

How to write a simple lexer/parser for user input commands in Python?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related