0

NOTE: python 3.2

I want to make a python script that recieves c++ simple expressions as input, and outputs the very same expressions as tokens.

I vaguely remember my course in compilation, and I need something far less complex than a compiler.

Examples

int& name1=arr1[place1];
int *name2=    arr2[ place2];

should output

[    "int", "&", "name1", "=", "arr1", "[", "place1", "]"    ]
[    "int", "*", "name2", "=", "arr2", "[", "place2", "]"    ]

The spaces shouldn't matter, and I don't want them in the output.

This seems like a very simple task for someone who knows what they're doing, while I keep getting garbage white spaces or getting the division at wrong places.

I would greatly appreciate a quick solution for this - it really looks like a one-liner to me

Note that I only need expressions like I showed here. Nothing fancy.

Thanks

4
  • It's generally appreciated to show the code you already got. Commented Aug 26, 2015 at 17:41
  • 1
    @EliKorvigo I'm in a military environment that is closed to the world network. Can't get my code out. Anyway, I thought this would be an easy question that doesn't really need preliminary work. If it isn't do tell. Commented Aug 26, 2015 at 17:43
  • If these suggestions aren't working, try describing your algorithm since you can't post code. Commented Aug 26, 2015 at 18:01
  • 1
    You can probably repeatedly refine regular expressions to get an approximation to what you want. Or you could build a simple, readable and maintainable lexer using PLY or some similar Python library. I'd strongly suggest option 2. Commented Aug 26, 2015 at 19:03

4 Answers 4

2

Not overly familiar with c++ but you could maybe use re.findall with a list of special chars:

lines="""int& name1=arr1[place1];
int *name2=    arr2[ place2];"""
import re
for line in lines.splitlines():
    print(re.findall("[\*\$\[\]&=]|\w+",line))
['int', '&', 'name1', '=', 'arr1', '[', 'place1', ']']
['int', '*', 'name2', '=', 'arr2', '[', 'place2', ']']
Sign up to request clarification or add additional context in comments.

Comments

2

Looks to me like you need to define a list of "special/operator" characters. Replace any of those characters with itself plus a space of padding on either side. Use string.split() to turn the string into a list of "words". If you need a string representation, finish up with string.join(wordlist, "', '") and add a "[ '" to the front and "' ]" to the end.

I'm almost certainly missing a few things, like looking for semicolons to strip off, or to use in breaking apart concatenated expressions. You weren't specific about how many expressions you'd read in at once. If you read in many at a time, you could split on the semicolon character, then iterate over the resulting list of expressions.

2 Comments

you can assume I have one such expression per line. as simple as it gets
There's probably a clever list comprehension to do this - it seems like there's one for everything. This is a simple suggestion instead, which is what I always try first.
1

The first step is to replace the spaces with a blank. that is ' ' with a ''. Then use a split function. Make a list of special characters or words, and replace them with a special character and a delimiter. Split the line with the delimiter. Here is the example:

for line in sys.stdin:
    line = line.replace(' ', '')
    line = line.replace('&',',&,')
    a = line.split(',')

2 Comments

Although the examples don't show it, something like "int a = 1;" is also a valid expression, which should return ['int', 'a', '=', '1'], but removing the space will incorrectly merge the "int" and "a".
the ideas in this example were the most beneficial to me, and i managed to make something happen. Thanks!
0

Here is a generator that might do the trick:

def parseCPP(line):
   line=line.rstrip(";")
   word=""
   for i in line:
       if i.isalnum():
           word+=i
       else:
           if word:
               yield word
               word=""
           if i!=" ":
               yield i

Note this just picks up consecutive strings of alphanumeric characters. Any non-space characters are assumed to be operators/tokens by themselves.

Hope this helps :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.