Python parser of c++ simple expressions

Question

NOTE: python 3.2

I want to make a python script that recieves c++ simple expressions as input, and outputs the very same expressions as tokens.

I vaguely remember my course in compilation, and I need something far less complex than a compiler.

Examples

int& name1=arr1[place1];
int *name2=    arr2[ place2];

should output

[    "int", "&", "name1", "=", "arr1", "[", "place1", "]"    ]
[    "int", "*", "name2", "=", "arr2", "[", "place2", "]"    ]

The spaces shouldn't matter, and I don't want them in the output.

This seems like a very simple task for someone who knows what they're doing, while I keep getting garbage white spaces or getting the division at wrong places.

I would greatly appreciate a quick solution for this - it really looks like a one-liner to me

Note that I only need expressions like I showed here. Nothing fancy.

Thanks

It's generally appreciated to show the code you already got. — Eli Korvigo
– Eli Korvigo, Commented Aug 26, 2015 at 17:41
@EliKorvigo I'm in a military environment that is closed to the world network. Can't get my code out. Anyway, I thought this would be an easy question that doesn't really need preliminary work. If it isn't do tell. — Gulzar
– Gulzar, Commented Aug 26, 2015 at 17:43
If these suggestions aren't working, try describing your algorithm since you can't post code. — Surreal Dreams
– Surreal Dreams, Commented Aug 26, 2015 at 18:01
You can probably repeatedly refine regular expressions to get an approximation to what you want. Or you could build a simple, readable and maintainable lexer using PLY or some similar Python library. I'd strongly suggest option 2. — rici
– rici, Commented Aug 26, 2015 at 19:03

Padraic Cunningham · Accepted Answer · 2015-08-26 17:59:32Z

2

Not overly familiar with c++ but you could maybe use re.findall with a list of special chars:

lines="""int& name1=arr1[place1];
int *name2=    arr2[ place2];"""
import re
for line in lines.splitlines():
    print(re.findall("[\*\$\[\]&=]|\w+",line))
['int', '&', 'name1', '=', 'arr1', '[', 'place1', ']']
['int', '*', 'name2', '=', 'arr2', '[', 'place2', ']']

edited Aug 26, 2015 at 17:59

answered Aug 26, 2015 at 17:53

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Surreal Dreams · Accepted Answer · 2015-08-26 18:00:38Z

2

Looks to me like you need to define a list of "special/operator" characters. Replace any of those characters with itself plus a space of padding on either side. Use string.split() to turn the string into a list of "words". If you need a string representation, finish up with string.join(wordlist, "', '") and add a "[ '" to the front and "' ]" to the end.

I'm almost certainly missing a few things, like looking for semicolons to strip off, or to use in breaking apart concatenated expressions. You weren't specific about how many expressions you'd read in at once. If you read in many at a time, you could split on the semicolon character, then iterate over the resulting list of expressions.

edited Aug 26, 2015 at 18:00

answered Aug 26, 2015 at 17:51

Surreal Dreams

26.4k4 gold badges49 silver badges61 bronze badges

2 Comments

Gulzar Over a year ago

you can assume I have one such expression per line. as simple as it gets

Surreal Dreams Over a year ago

There's probably a clever list comprehension to do this - it seems like there's one for everything. This is a simple suggestion instead, which is what I always try first.

Bhavani A B · Accepted Answer · 2015-08-26 19:50:31Z

1

The first step is to replace the spaces with a blank. that is ' ' with a ''. Then use a split function. Make a list of special characters or words, and replace them with a special character and a delimiter. Split the line with the delimiter. Here is the example:

for line in sys.stdin:
    line = line.replace(' ', '')
    line = line.replace('&',',&,')
    a = line.split(',')

answered Aug 26, 2015 at 19:50

Bhavani A B

262 bronze badges

2 Comments

chthonicdaemon Over a year ago

Although the examples don't show it, something like "int a = 1;" is also a valid expression, which should return ['int', 'a', '=', '1'], but removing the space will incorrectly merge the "int" and "a".

Gulzar Over a year ago

the ideas in this example were the most beneficial to me, and i managed to make something happen. Thanks!

Michael S Priz · Accepted Answer · 2015-08-26 17:50:57Z

0

Here is a generator that might do the trick:

def parseCPP(line):
   line=line.rstrip(";")
   word=""
   for i in line:
       if i.isalnum():
           word+=i
       else:
           if word:
               yield word
               word=""
           if i!=" ":
               yield i

Note this just picks up consecutive strings of alphanumeric characters. Any non-space characters are assumed to be operators/tokens by themselves.

Hope this helps :)

answered Aug 26, 2015 at 17:50

Michael S Priz

1,1267 silver badges17 bronze badges

Collectives™ on Stack Overflow

Python parser of c++ simple expressions

4 Answers 4

Comments

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related