0

Input file contains following lines:

a=b*c;
d=a+2;
c=0;
b=a;

Now for each line I want to extract variables that has been used.For example, for line 1, the output should be [a,b,c].Currently I am doing as follows :

var=[a,b,c,d]     # list of variables
for line in file_ptr :
    if '=' in line :
        temp=line.split('=') :
        ans=list(temp[0])
        if '+' in temp[1] :
             # do something
        elif '*' in temp[1] :
             # do something
        else :
             # single variable as line 4  OR constant as line 3

Is it possible to do this using regex?

EDIT:

Expected output for above file :

[a,b,c]
[d,a]
[c]
[a,b]
3
  • How is the question too broad ? Commented May 4, 2016 at 20:51
  • What output would you expect from the input you've specified? Commented May 4, 2016 at 21:16
  • @Robᵩ I have added them in question Commented May 4, 2016 at 21:19

5 Answers 5

1

I would use re.findall() with whatever pattern matches variable names in the example's programming language. Assuming a typical language, this might work for you:

import re

lines = '''a=b*c;
d=a+2;
c=0;
b=a;'''

for line in lines.splitlines():
    print re.findall('[_a-z][_a-z0-9]*', line, re.I)
Sign up to request clarification or add additional context in comments.

1 Comment

This answer contains the explanation @AkaSh is looking for. Python's variable names are case sensitive, so it needs the Ignore Case flag (or, alternatively, just a few A-Zs). I'd throw in a few \b for consistency, though.
1

I'd use some shorter pattern for matching variable names:

import re
strs = ['a=b*c;', 'd=a+2;', 'c=0;', 'b=a;']
print([re.findall(r'[_a-z]\w*', x, re.I) for x in strs])

See the Python demo

Pattern matches:

  • [_a-z] - a _ or an ASCII letter (any upper or lowercase due to the case insensitive modifier use re.I)
  • \w* - 0 or more alphanumeric or underscore characters.

See the regex demo

Comments

0

If you want just the variables, then do this:

answer = []
for line in file_ptr :
    temp = []
    for char in line:
        if char.isalpha():
            temp.append(char)
    answer.append(temp)

A word of caution though: this would work only with variables that are exactly 1 character in length. More details about isalpha() can be found here or here.

Comments

0

I'm not entirely sure what you're after, but you can do something like this:

re.split(r'[^\w]', line)

to give a list of the alphabetic characters in the line:

>>> re.split(r'[^\w]', 'a=b*c;')
['a', 'b', 'c', '']

3 Comments

a, b, c are variables.These are elements of list var
I'm sorry, I have no idea what you mean.
This fails on the examples with digits; digits are also 'word characters' and so would be included. It can trivially be fixed, though, especially if Python's re supports [[:alpha:]]. However, for 'any' variable name, you need a different expression for just the first character and all next ones, because a0 is a valid variable name.
0

This is how I did :

l=re.split(r'[^A-Za-z]', 'a=b*2;')
l=filter(None,l)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.