Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Extracting attributes from compiled python code or parse trees

Reply
Thread Tools

Extracting attributes from compiled python code or parse trees

 
 
Matteo
Guest
Posts: n/a
 
      07-23-2007
Hello-
I am trying to get Python to extract attributes in full dotted form
from compiled expression. For instance, if I have the following:

param = compile('a.x + a.y','','single')

then I would like to retrieve the list consisting of ['a.x','a.y'].
I have tried using inspect to look at 'co_names', but when I do that,
I get:

>>> inspect.getmembers(param)[23]

('co_names', ('a', 'x', 'y'))

with no way to determine that 'x' and 'y' are both attributes of 'a'.

The reason I am attempting this is to try and automatically determine
data dependencies in a user-supplied formula (in order to build a
dataflow network). I would prefer not to have to write my own parser
just yet.

Alternatively, I've looked at the parser module, but I am experiencing
some difficulties in that the symbol list does not seem to match that
listed in the python grammar reference (not surprising, since I am
using python2.5, and the docs seem a bit dated)

In particular:

>>> import parser
>>> import pprint
>>> import symbol
>>> tl=parser.expr("a.x").tolist()
>>> pprint.pprint(tl)


[258,
[326,
[303,
[304,
[305,
[306,
[307,
[309,
[310,
[311,
[312,
[313,
[314,
[315,
[316, [317, [1, 'a']], [321, [23, '.'], [1,
'x']]]]]]]]]]]]]]]],
[4, ''],
[0, '']]

>>> print symbol.sym_name[316]

power

Thus, for some reason, 'a.x' seems to be interpreted as a power
expression, and not an 'attributeref' as I would have anticipated (in
fact, the symbol module does not seem to contain an 'attributeref'
symbol)

(for the curious, here is the relevant part of the AST for "a**x":
[316,
[317, [1, 'a']],
[36, '**'],
[315, [316, [317, [1, 'x']]]]
)

Anyway, I could write an AST analyzer that searches for the correct
pattern, but it would be relying on undocumented behavior, and I'm
hoping there is a better way.

(By the way, I realize that malicious users could almost certainly
subvert my proposed dependency mechanism, but for this project, I'm
guarding against Murphy, not Macchiavelli)

Thanks,
-matt

 
Reply With Quote
 
 
 
 
Gabriel Genellina
Guest
Posts: n/a
 
      07-23-2007
En Mon, 23 Jul 2007 18:13:05 -0300, Matteo <(E-Mail Removed)> escribió:

> I am trying to get Python to extract attributes in full dotted form
> from compiled expression. For instance, if I have the following:
>
> param = compile('a.x + a.y','','single')
>
> then I would like to retrieve the list consisting of ['a.x','a.y'].
>
> The reason I am attempting this is to try and automatically determine
> data dependencies in a user-supplied formula (in order to build a
> dataflow network). I would prefer not to have to write my own parser
> just yet.


If it is an expression, I think you should use "eval" instead of "single"
as the third argument to compile.

> Alternatively, I've looked at the parser module, but I am experiencing
> some difficulties in that the symbol list does not seem to match that
> listed in the python grammar reference (not surprising, since I am
> using python2.5, and the docs seem a bit dated)


Yes, the grammar.txt in the docs is a bit outdated (or perhaps it's a
simplified one), see the Grammar/Grammar file in the Python source
distribution.

> In particular:
>
>>>> import parser
>>>> import pprint
>>>> import symbol
>>>> tl=parser.expr("a.x").tolist()
>>>> pprint.pprint(tl)

>
> [258,
> [326,
> [303,
> [304,
> [305,
> [306,
> [307,
> [309,
> [310,
> [311,
> [312,
> [313,
> [314,
> [315,
> [316, [317, [1, 'a']], [321, [23, '.'], [1,
> 'x']]]]]]]]]]]]]]]],
> [4, ''],
> [0, '']]
>
>>>> print symbol.sym_name[316]

> power
>
> Thus, for some reason, 'a.x' seems to be interpreted as a power
> expression, and not an 'attributeref' as I would have anticipated (in
> fact, the symbol module does not seem to contain an 'attributeref'
> symbol)


Using this little helper function to translate symbols and tokens:

names = symbol.sym_name.copy()
names.update(token.tok_name)
def human_readable(lst):
lst[0] = names[lst[0]]
for item in lst[1:]:
if isinstance(item,list):
human_readable(item)

the same tree becomes:

['eval_input',
['testlist',
['test',
['or_test',
['and_test',
['not_test',
['comparison',
['expr',
['xor_expr',
['and_expr',
['shift_expr',
['arith_expr',
['term',
['factor',
['power',
['atom', ['NAME', 'a']],
['trailer', ['DOT', '.'], ['NAME', 'x']]]]]]]]]]]]]]]],
['NEWLINE', ''],
['ENDMARKER', '']]

which is correct is you look at the symbols in the (right) Grammar file.

But if you are only interested in things like a.x, maybe it's a lot
simpler to use the tokenizer module, looking for the NAME and OP tokens as
they appear in the source expression.


--
Gabriel Genellina

 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      07-24-2007
Matteo wrote:

> I am trying to get Python to extract attributes in full dotted form
> from compiled expression. For instance, if I have the following:
>
> param = compile('a.x + a.y','','single')
>
> then I would like to retrieve the list consisting of ['a.x','a.y'].
> I have tried using inspect to look at 'co_names', but when I do that,


You can have a look at the compiler package. A very limited example:

import compiler
import compiler.ast
import sys

class Visitor:
def __init__(self):
self.names = []
def visitName(self, node):
self.names.append(node.name)
def visitGetattr(self, node):
dotted = []
n = node
while isinstance(n, compiler.ast.Getattr):
dotted.append(n.attrname)
n = n.expr
try:
dotted.append(n.name)
except AttributeError:
print >> sys.stderr, "ignoring", node
else:
self.names.append(".".join(reversed(dotted)))


if __name__ == "__main__":
expr = " ".join(sys.argv[1:])
visitor = Visitor()
compiler.walk(compiler.parse(expr), visitor)
print "\n".join(visitor.names)

Output:

$ python dotted_names.py "a + b * (c + sin(d.e) + x.y.z)"
a
b
c
sin
d.e
x.y.z

$ python dotted_names.py "a + b * ((c + d).e + x.y.z)"
ignoring Getattr(Add((Name('c'), Name('d'))), 'e')
a
b
x.y.z

Peter
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Binary search trees (AVL trees) jacob navia C Programming 34 01-08-2010 07:27 PM
If I create a page, then it's compiled upon first request, where cani find the compiled code?? lander ASP .Net 5 03-05-2008 04:34 PM
Parse reserved attributes as normal attributes Max XML 1 09-22-2006 12:04 PM
Accessing Python parse trees Manlio Perillo Python 5 03-05-2005 09:38 PM
g++ compiled C++ code called from gcc compiled C code Klaus Schneider C++ 1 12-02-2004 01:44 PM



Advertisments