Find dependencies in a python source/script

Question

I have a bunch of simple scripts in Python with simple expressions[1] like :

C = A+B
D = C * 4

I need to execute them, but most importantly I need to know what are the objects I depend on; in the previous case, the object A and B are outer dependencies. Eg. given i have the former code in a var called source, i wanna be able to:

deps = { "A" : 1 , "B": 2}
exec source in deps

so it's strictly necessary to know how to build the dict deps.

I've lurked into the ast Python module but I had no clue.

[1] simple math aggregations, to an extent for cycles, nothing more.

I guess you could be naughty and catch the first few NameErrors — Jakob Bowyer
– Jakob Bowyer, Commented Jan 4, 2013 at 22:41
I was naughty enough to not mention that I'm doing exactly like @JakobBowyer suggested ;) — Giupo
– Giupo, Commented Jan 4, 2013 at 22:54
An AST isn't enough. Without a full AST, full name resolution, and serious data flow analysis, you won't be able to make this work for anything but really trivial scripts. Because Python is a dynamic language, even serious data flow analysis may not provide decent answers. If you are willing to constrain your "Python" code to be very simple, you may be able to build an analyzer which is reliable. Your problem is that your users won't pay any attention to the constraints you'll impose. So unless the user is just you, you're unlikely to get a good result. — Ira Baxter
– Ira Baxter, Commented Jan 4, 2013 at 23:08
Thank you @IraBaxter, I'm well aware of what you assert; I'm already considering dataflow solutions: I'm building a graph of those scripts... — Giupo
– Giupo, Commented Jan 4, 2013 at 23:19
What is the precise description of what you need to allow in your "simple" expressions? Without that constraint, you can't get any useful advice. — Ira Baxter
– Ira Baxter, Commented Jan 4, 2013 at 23:26

unutbu · Accepted Answer · 2013-01-05 00:35:31Z

4

You can tokenize Python source code using the tokenize module from the standard library. This will allow you to find all variable names used in the script.

Now suppose we define a "non-dependency" as any variable name that comes immediately before an = sign. Then, depending on how simple your script code really is (see the Caveats below), you may be able to determine the variable names which are not non-dependencies this way:

import tokenize
import io
import token
import collections
import keyword

kwset = set(keyword.kwlist)
class Token(collections.namedtuple('Token', 'num val start end line')):
    @property
    def name(self):
        return token.tok_name[self.num]

source = '''
C = A+B
D = C * 4
'''

lastname = None
names = set()
not_dep = set()
for tok in tokenize.generate_tokens(io.BytesIO(source).readline):
    tok = Token(*tok)
    print(tok.name, tok.val)
    if tok.name == 'NAME':
        names.add(tok.val)
        lastname = tok.val
    if tok.name == 'OP' and tok.val == '=':
        not_dep.add(lastname)

print(names)
# set(['A', 'C', 'B', 'D'])
print(not_dep)
# set(['C', 'D'])

deps = dict.fromkeys(names - not_dep - kwset, 1)
print(deps)
# {'A': 1, 'B': 1}

Caveats:

If your scripts contain statements other than simple assignments, then names may become populated with undesired variable names. For example,
```
import numpy
```
would add both 'import' and 'numpy' to the set names.
If your script contains an assignment that makes use of left-hand side tuple unpacking, such as
```
E, F = 1, 2
```
then the naive code above will only recognize that F is not a dependency.

edited Jan 5, 2013 at 0:35

answered Jan 4, 2013 at 22:46

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Giupo Over a year ago

Thank you very much! That's a start ;) I guess, if I get my coworkers to write those scripts with "variables" all uppercase and not using single line multiple assignments I could use it ;)

unutbu Over a year ago

Or convince your coworkers to write runnable code that makes dependencies explicit. A function perhaps?

unutbu Over a year ago

Also, as far as exec source in deps is concerned, it does not matter if deps contains keys for variable names which are not dependencies. The source redefines them anyway. So, for example, if you are bug-testing by generating random values to use in deps, then it is okay if your definition of dependency is overly broad.

Giupo Over a year ago

You're right, @unutbu: actually I would like to use what's supposed to be in the deps variable to check for existence of those variables in a graph of variables (the topic gets broader....) The scripts come from a grammar conversion tool I've written with ANTLR; they could (rarely) be edited by "coworkers" who know nothing of the topics here discussed ;)...

unutbu Over a year ago

Thanks, Giupo for the suggested edit regarding keyword.kwlist. I've edited my post to include it.

Collectives™ on Stack Overflow

Find dependencies in a python source/script

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related