5

I have a bunch of simple scripts in Python with simple expressions[1] like :

C = A+B
D = C * 4

I need to execute them, but most importantly I need to know what are the objects I depend on; in the previous case, the object A and B are outer dependencies. Eg. given i have the former code in a var called source, i wanna be able to:

deps = { "A" : 1 , "B": 2}
exec source in deps

so it's strictly necessary to know how to build the dict deps.

I've lurked into the ast Python module but I had no clue.


[1] simple math aggregations, to an extent for cycles, nothing more.

7
  • 1
    I guess you could be naughty and catch the first few NameErrors Commented Jan 4, 2013 at 22:41
  • I was naughty enough to not mention that I'm doing exactly like @JakobBowyer suggested ;) Commented Jan 4, 2013 at 22:54
  • An AST isn't enough. Without a full AST, full name resolution, and serious data flow analysis, you won't be able to make this work for anything but really trivial scripts. Because Python is a dynamic language, even serious data flow analysis may not provide decent answers. If you are willing to constrain your "Python" code to be very simple, you may be able to build an analyzer which is reliable. Your problem is that your users won't pay any attention to the constraints you'll impose. So unless the user is just you, you're unlikely to get a good result. Commented Jan 4, 2013 at 23:08
  • Thank you @IraBaxter, I'm well aware of what you assert; I'm already considering dataflow solutions: I'm building a graph of those scripts... Commented Jan 4, 2013 at 23:19
  • What is the precise description of what you need to allow in your "simple" expressions? Without that constraint, you can't get any useful advice. Commented Jan 4, 2013 at 23:26

1 Answer 1

4

You can tokenize Python source code using the tokenize module from the standard library. This will allow you to find all variable names used in the script.

Now suppose we define a "non-dependency" as any variable name that comes immediately before an = sign. Then, depending on how simple your script code really is (see the Caveats below), you may be able to determine the variable names which are not non-dependencies this way:

import tokenize
import io
import token
import collections
import keyword

kwset = set(keyword.kwlist)
class Token(collections.namedtuple('Token', 'num val start end line')):
    @property
    def name(self):
        return token.tok_name[self.num]

source = '''
C = A+B
D = C * 4
'''

lastname = None
names = set()
not_dep = set()
for tok in tokenize.generate_tokens(io.BytesIO(source).readline):
    tok = Token(*tok)
    print(tok.name, tok.val)
    if tok.name == 'NAME':
        names.add(tok.val)
        lastname = tok.val
    if tok.name == 'OP' and tok.val == '=':
        not_dep.add(lastname)

print(names)
# set(['A', 'C', 'B', 'D'])
print(not_dep)
# set(['C', 'D'])

deps = dict.fromkeys(names - not_dep - kwset, 1)
print(deps)
# {'A': 1, 'B': 1}

Caveats:

  • If your scripts contain statements other than simple assignments, then names may become populated with undesired variable names. For example,

    import numpy
    

    would add both 'import' and 'numpy' to the set names.

  • If your script contains an assignment that makes use of left-hand side tuple unpacking, such as

    E, F = 1, 2
    

    then the naive code above will only recognize that F is not a dependency.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you very much! That's a start ;) I guess, if I get my coworkers to write those scripts with "variables" all uppercase and not using single line multiple assignments I could use it ;)
Or convince your coworkers to write runnable code that makes dependencies explicit. A function perhaps?
Also, as far as exec source in deps is concerned, it does not matter if deps contains keys for variable names which are not dependencies. The source redefines them anyway. So, for example, if you are bug-testing by generating random values to use in deps, then it is okay if your definition of dependency is overly broad.
You're right, @unutbu: actually I would like to use what's supposed to be in the deps variable to check for existence of those variables in a graph of variables (the topic gets broader....) The scripts come from a grammar conversion tool I've written with ANTLR; they could (rarely) be edited by "coworkers" who know nothing of the topics here discussed ;)...
Thanks, Giupo for the suggested edit regarding keyword.kwlist. I've edited my post to include it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.