Extract variables using python regex

Question

Input file contains following lines:

a=b*c;
d=a+2;
c=0;
b=a;

Now for each line I want to extract variables that has been used.For example, for line 1, the output should be [a,b,c].Currently I am doing as follows :

var=[a,b,c,d]     # list of variables
for line in file_ptr :
    if '=' in line :
        temp=line.split('=') :
        ans=list(temp[0])
        if '+' in temp[1] :
             # do something
        elif '*' in temp[1] :
             # do something
        else :
             # single variable as line 4  OR constant as line 3

Is it possible to do this using regex?

EDIT:

Expected output for above file :

[a,b,c]
[d,a]
[c]
[a,b]

What output would you expect from the input you've specified? — Robᵩ
– Robᵩ, Commented May 4, 2016 at 21:16

Robᵩ · Accepted Answer · 2016-05-04 21:20:23Z

1

I would use re.findall() with whatever pattern matches variable names in the example's programming language. Assuming a typical language, this might work for you:

import re

lines = '''a=b*c;
d=a+2;
c=0;
b=a;'''

for line in lines.splitlines():
    print re.findall('[_a-z][_a-z0-9]*', line, re.I)

answered May 4, 2016 at 21:20

Robᵩ

170k20 gold badges251 silver badges323 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jongware Over a year ago

This answer contains the explanation @AkaSh is looking for. Python's variable names are case sensitive, so it needs the Ignore Case flag (or, alternatively, just a few A-Zs). I'd throw in a few \b for consistency, though.

Wiktor Stribiżew · Accepted Answer · 2016-05-04 21:24:32Z

1

I'd use some shorter pattern for matching variable names:

import re
strs = ['a=b*c;', 'd=a+2;', 'c=0;', 'b=a;']
print([re.findall(r'[_a-z]\w*', x, re.I) for x in strs])

See the Python demo

Pattern matches:

[_a-z] - a _ or an ASCII letter (any upper or lowercase due to the case insensitive modifier use re.I)
\w* - 0 or more alphanumeric or underscore characters.

See the regex demo

answered May 4, 2016 at 21:24

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Comments

lesingerouge · Accepted Answer · 2016-05-04 20:57:34Z

0

If you want just the variables, then do this:

answer = []
for line in file_ptr :
    temp = []
    for char in line:
        if char.isalpha():
            temp.append(char)
    answer.append(temp)

A word of caution though: this would work only with variables that are exactly 1 character in length. More details about isalpha() can be found here or here.

answered May 4, 2016 at 20:57

lesingerouge

1,1687 silver badges14 bronze badges

Comments

Daniel Roseman · Accepted Answer · 2016-05-04 20:58:11Z

0

I'm not entirely sure what you're after, but you can do something like this:

re.split(r'[^\w]', line)

to give a list of the alphabetic characters in the line:

>>> re.split(r'[^\w]', 'a=b*c;')
['a', 'b', 'c', '']

answered May 4, 2016 at 20:58

Daniel Roseman

602k68 gold badges910 silver badges923 bronze badges

3 Comments

AkaSh Over a year ago

a, b, c are variables.These are elements of list var

Daniel Roseman Over a year ago

I'm sorry, I have no idea what you mean.

Jongware Over a year ago

This fails on the examples with digits; digits are also 'word characters' and so would be included. It can trivially be fixed, though, especially if Python's re supports [[:alpha:]]. However, for 'any' variable name, you need a different expression for just the first character and all next ones, because a0 is a valid variable name.

AkaSh · Accepted Answer · 2016-05-04 21:16:19Z

0

This is how I did :

l=re.split(r'[^A-Za-z]', 'a=b*2;')
l=filter(None,l)

answered May 4, 2016 at 21:16

AkaSh

5264 silver badges18 bronze badges

Collectives™ on Stack Overflow

Extract variables using python regex

5 Answers 5

1 Comment

Comments

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related