On the subject of writing a parser, I take this opportunity to write my own using state machine:

import sys
ASTERISK = '*'
DEFAULT = 'default'
EOL = '\n'
ESCAPE = '\\'
QUOTE = '"'
SLASH = '/'
class ExtractStrings:
def __init__(self, multiline_string):
self.buffer = multiline_string
self.chars_collected = ''
self.strings = None
def noop(self, ch):
pass
def collect_char(self, ch):
self.chars_collected += ch
def return_string(self, ch):
self.strings.append(self.chars_collected)
self.chars_collected = ''
def parse(self):
self.strings = []
state = {
'start': {
QUOTE: (self.noop, 'in_string'),
SLASH: (self.noop, 'first_slash'),
DEFAULT: (self.noop, 'start'),
},
'in_string': {
QUOTE: (self.return_string, 'start'),
ESCAPE: (self.collect_char, 'escaping'),
DEFAULT: (self.collect_char, 'in_string'),
},
'escaping': {
DEFAULT: (self.collect_char, 'in_string'),
},
'first_slash': {
SLASH: (self.noop, 'line_comment'),
ASTERISK: (self.noop, 'block_comment'),
DEFAULT: (self.noop, 'start'),
},
'line_comment': {
EOL: (self.noop, 'start'),
DEFAULT: (self.noop, 'line_comment'),
},
'block_comment': {
ASTERISK: (self.noop, 'near_comment_block_end'),
DEFAULT: (self.noop, 'block_comment'),
},
'near_comment_block_end': {
SLASH: (self.noop, 'start'),
ASTERISK: (self.noop, 'near_comment_block_end'),
DEFAULT: (self.noop, 'block_comment'),
}
}
current = 'start'
for ch in self.buffer:
default = state[current][DEFAULT]
action, next_state = state[current].get(ch, default)
action(ch)
current = next_state
def __iter__(self):
if self.strings is None:
self.parse()
return iter(self.strings)
if __name__ == '__main__':
with open(sys.argv[1]) as f:
code = f.read()
for string_literal in ExtractStrings(code):
print('"%s"' % string_literal)
How does it work?
The state machine defines different states, what to do at each state (not shown in the diagram) and the transitions to the next states. Once the
state machine is defined (as a nested dictionary), it is just a matter of perform the action for the state, read the next char and look up the state machine to see which state we should transition to.
The state machine is a nested dictionary. For the outer dictionary, the key is the state name and the value is the inner dictionary. For the inner dictionary, the key is the next char and the value is a tuple of (action, next state).
lookaheadorlookbehindassertions. Try pythex.org for regex matching and tips.