2

For a personal project, i need to write a simple lexer in order to treat user input commands. They can contain four patterns and brackets :

  • operators : ., +, - or %
  • names ([a-zA-Z0-9]) : for instance foobar, fooZ or bar2
  • filters ([a-z]{1}"[some text]") : for instance a"hello world", p"escaped \"string\""
  • other filters ([a-z]{1}~"[some text]") : for instance a~"hello world", p~"escaped \"string\""

I would like to split the command in order to have a hierarchical list (based on the brackets) that contains an entry for each operator, an entry for each name (a single string), and an entry for each filter (a dictionary, for instance {'a': 'hello world'} or {'p~': 'escaped "string"'}).

So here's an example of what i want to get with this input (spaces around operators are optional):

foo . bar - (a"hello \"world\"" % (b~"foo" + bar) . (fooZ.(bar2+(c"foo bar".d"bar foo"))))

[
  'foo',
  '.',
  'bar',
  '-',
  [
    {'a': 'hello "world"'},
    '%',
    [
      {'b~': 'foo'},
      '+',
      'bar'
    ]
    '.',
    [
      'fooZ',
      '.',
      [
        'bar2',
        '+',
        [
          {'c': 'foo bar'},
          '.',
          {'d': 'bar foo'}
        ]
      ]
    ]
  ]
]

I had a look at pyparsing, PLY, pyPEG, Yapps, etc. but they all seem hard to use and i don't know if i need this kind of tool since the grammar i need is not very complex. Thanks in advance for your suggestions!

5
  • 2
    It's better to use one of the tools for parsing, no matter how simple the grammar looks :) This way, you can easily expand to future requirements, if needed. I'd recommend Python PLY :) Commented Jul 9, 2012 at 12:03
  • Is this a personal project more along the lines of learning how to parse? Or, is it just about parsing the input then doing stuff with it? Depending on which one it is will vary on what you will want to use. The first is more fun to be honest, although yes you are reinventing a very beaten horse, but helps with personal understanding. Commented Jul 9, 2012 at 13:00
  • Thanks Mihai, i'll try to go into PLY in depth. Commented Jul 9, 2012 at 13:13
  • sean > it's the second, so i think the best way is to use a tool as Mihai said, using regex seems boring and i don't have a lot of time Commented Jul 9, 2012 at 13:14
  • 1
    pyparsing is dead easy to use. It would be my first port of call for a simple grammar like yours. Commented Jul 9, 2012 at 13:49

1 Answer 1

3

See how this pyparsing solution addresses your stated grammar:

from pyparsing import *
ParserElement.enablePackrat()
import string

name = Word(alphanums)

filter = (Combine(oneOf(list(string.ascii_lowercase)) + Optional('~')) +
          dblQuotedString.setParseAction(removeQuotes))
# convert parsed filter to dict
filter.setParseAction(lambda t:{t[0]:t[1]})

expr = operatorPrecedence(filter | name, 
            [
            (oneOf('. % -'), 2, opAssoc.LEFT),
            ('+', 2, opAssoc.LEFT),
            ])

test = r'foo . bar - (a"hello \"world\"" % (b~"foo" + bar) . (fooZ.(bar2+(c"foo bar".d"bar foo"))))'

print expr.parseString(test, parseAll=True).asList()

Prints:

[['foo',
  '.',
  'bar',
  '-',
  [{'a': 'hello \\"world\\"'},
   '%',
   [{'b~': 'foo'}, '+', 'bar'],
   '.',
   ['fooZ', '.', ['bar2', '+', [{'c': 'foo bar'}, '.', {'d': 'bar foo'}]]]]]]
Sign up to request clarification or add additional context in comments.

1 Comment

Now I'm not so sure there is any precedence of one operator over another, the grouping in your example may just be due to parenthetical nesting. If that is the case, you can merge the two levels in operatorPrecedence down to just (oneOf('. % + -'), 2, opAssoc.LEFT).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.