Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Python parser that records source ranges

Reply
Thread Tools

Python parser that records source ranges

 
 
Jonathan Edwards
Guest
Posts: n/a
 
      09-29-2003
The parser library module only records source line numbers for tokens. I
need a parser that records ranges of line and character locations for
each AST node, so I can map back to the source. Does anyone know of such
a thing? Thanks

Jonathan

 
Reply With Quote
 
 
 
 
Jeff Epler
Guest
Posts: n/a
 
      09-29-2003
The tokenize module will give column information for each token, but
it produces a stream of tokens only, not an AST.

Jeff

 
Reply With Quote
 
 
 
 
logistix at cathoderaymission.net
Guest
Posts: n/a
 
      09-29-2003
Jonathan Edwards <(E-Mail Removed)> wrote in message news:<qRKdb.456249$Oz4.260848@rwcrnsc54>...
> The parser library module only records source line numbers for tokens. I
> need a parser that records ranges of line and character locations for
> each AST node, so I can map back to the source. Does anyone know of such
> a thing? Thanks
>
> Jonathan


You know there's not going to be a one-to-one relationship, right?
Most ast nodes are symbols and aren't going to match to any tokens.
Python asts also use a lot of intermediate nodes to enforce operator
precidence.

Anyway, I have some rather specialized code in PyXR that syncs tokens
to an ast. You probably won't be able to use it out of the box but it
should give you a good start:

http://www.cathoderaymission.net/~logistix/PyXR/

The source file of particular interest to you would be astToHtml.py:

http://tinyurl.com/p3cn
 
Reply With Quote
 
Jonathan Edwards
Guest
Posts: n/a
 
      10-01-2003
So the basic idea is to match up the leaves of the AST with the list of
tokens from tokenizer, which do contain location info. I had thought of
that, but was hoping there was a more informative parser out there.
Thanks.

Jonathan


logistix at cathoderaymission.net wrote:

> Jonathan Edwards <(E-Mail Removed)> wrote in message news:<qRKdb.456249$Oz4.260848@rwcrnsc54>...
>
>>The parser library module only records source line numbers for tokens. I
>>need a parser that records ranges of line and character locations for
>>each AST node, so I can map back to the source. Does anyone know of such
>>a thing? Thanks
>>
>>Jonathan

>
>
> You know there's not going to be a one-to-one relationship, right?
> Most ast nodes are symbols and aren't going to match to any tokens.
> Python asts also use a lot of intermediate nodes to enforce operator
> precidence.
>
> Anyway, I have some rather specialized code in PyXR that syncs tokens
> to an ast. You probably won't be able to use it out of the box but it
> should give you a good start:
>
> http://www.cathoderaymission.net/~logistix/PyXR/
>
> The source file of particular interest to you would be astToHtml.py:
>
> http://tinyurl.com/p3cn


 
Reply With Quote
 
logistix at cathoderaymission.net
Guest
Posts: n/a
 
      10-01-2003
Jonathan Edwards <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> So the basic idea is to match up the leaves of the AST with the list of
> tokens from tokenizer, which do contain location info. I had thought of
> that, but was hoping there was a more informative parser out there.
> Thanks.
>
> Jonathan
>
>



Its really not that bad. The more I think about it, the code
reference I sent you is way overcomplicated. General pseudocode for
walking asts generated via parser.ast2tuple(parser.suite(code)) is:

def walk_node(node):
if len(node) == 2 and type(node[1]) is not tuple:
walk_token(node)
else:
return walk_symbol(node)

def walk_symbol(node):
symbol_type = node[0]
symbol_leaves = node[1:]
for leave in symbol_leaves:
walk_node(nod)

def walk_token(node):
token_type = node[0]
token_value = node[1]
 
Reply With Quote
 
Paul Paterson
Guest
Posts: n/a
 
      10-02-2003
"Jonathan Edwards" <(E-Mail Removed)> wrote in message
news:qRKdb.456249$Oz4.260848@rwcrnsc54...
> The parser library module only records source line numbers for tokens. I
> need a parser that records ranges of line and character locations for
> each AST node, so I can map back to the source. Does anyone know of such
> a thing? Thanks
>
> Jonathan
>


If I understand you correctly, then the Simpleparse parser may be just what
you are looking for:

http://simpleparse.sourceforge.net

It is very powerful but still easy to use. The AST it produces gives the
start and end points of the matching tokens. Below is an example for parsing
a statement (from a VB grammar) ... you will see each node comprises a tuple
of (token_name, start_char, end_char, [sub_node1, sub_node2, ...]).

The example below looks rather complex because of the grammar, but you can
see that most of the sub_node matches all relate to the same characters in
the source. You can easily match each token to the corresponding text in the
source.

Paul

>>> c("a = f(20, val)", verbose=1)

1 15
[('line_body',
0,
15,
[('single_statement',
0,
14,
[('assignment_statement',
0,
14,
[('object', 0, 1, [('primary', 0, 1, [('identifier', 0, 1, [])])]),
('expression',
4,
14,
[('par_expression',
4,
14,
[('base_expression',
4,
14,
[('simple_expr',
4,
14,
[('call',
4,
14,
[('object',
4,
14,
[('primary',
4,
5,
[('identifier', 4, 5, [])]),
('parameter_list',
5,
14,
[('list',
5,
14,
[('bare_list',
6,
13,
[('bare_list_item',
6,
8,
[('expression',
6,
8,
[('par_expression',
6,
8,
[('base_expression',
6,
8,
[('simple_expr',
6,
8,
[('atom',
6,
8,
[('literal',
6,
8,
[('integer',
6,
8,
[('decimalinteger',
6,
8,
None)])])])])])])])]),
('bare_list_item',
10,
13,
[('expression',
10,
13,
[('par_expression',
10,
13,
[('base_expression',
10,
13,
[('simple_expr',
10,
13,
[('call',
10,
13,
[('object',
10,
13,
[('primary',
10,
13,
[('identifier',
10,
13,

[])])])])])])])])])])])])])])])])])])])]),
('line_end', 14, 15, [('NEWLINE', 14, 15, None)])])]


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
problem in running a basic code in python 3.3.0 that includes HTML file Satabdi Mukherjee Python 1 04-04-2013 07:48 PM
Simple query returns 0 records in asp, but all records in vbscript masg0013@gmail.com ASP General 3 11-02-2006 09:23 AM
Delete records or update records Dan ASP General 1 05-10-2004 01:25 PM
match muliple header records to associated detail records Luke Airig XML 0 12-31-2003 12:06 AM
Ruby parser with character ranges Jonathan Edwards Ruby 4 12-13-2003 01:11 AM



Advertisments