Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Block comments (http://www.velocityreviews.com/forums/t559752-block-comments.html)

MartinRinehart@gmail.com 12-11-2007 06:09 PM

Block comments
 
Tomorrow is block comment day. I want them to nest. I think the reason
that they don't routinely nest is that it's a lot of trouble to code.
Two questions:

1) Given a start and end location (line position and char index) in an
array of lines of text, how do you Pythonly extract the whole block
comment? (Goal: not to have Bruno accusing me - correctly - of writing
C in Python.)

2) My tokenizer has a bunch of module-level constants including ones
that define block comment starts/ends. Suppose I comment that code
out. This is the situation:

/* start of block comment
....
BLOCK_COMMENT_END_CHARS = '*/'
....
end of block comment */

Is this the reason for """?

(If this is a good test of tokenizer smarts, cpp and javac flunked.)

Bruno Desthuilliers 12-11-2007 06:55 PM

Re: Block comments
 
MartinRinehart@gmail.com a écrit :
> Tomorrow is block comment day. I want them to nest. I think the reason
> that they don't routinely nest is that it's a lot of trouble to code.


Indeed.

> Two questions:
>
> 1) Given a start and end location (line position and char index) in an
> array of lines of text, how do you Pythonly extract the whole block
> comment? (Goal: not to have Bruno accusing me - correctly - of writing
> C in Python.)


Is the array of lines the appropriate data structure here ?

> 2) My tokenizer has a bunch of module-level constants including ones
> that define block comment starts/ends. Suppose I comment that code
> out. This is the situation:
>
> /* start of block comment
> ...
> BLOCK_COMMENT_END_CHARS = '*/'
> ...
> end of block comment */
>
> Is this the reason for """?


Triple-quoted strings are not comments, they are a way to build
multilines string litterals. The fact is that they are commonly used for
doctrings - for obvious reasons - but then it's the position of this
string litteral that makes it a docstring, not the fact that it's
triple-quoted.

wrt/ your above example, making it a legal construct imply that you
should not consider the block start/end markers as comment markers if
they are enclosed in string-litteral markers.

Now this doesn't solve the problem of nested block comments. Here, I
guess the solution would be to only allow fully nested block comments -
that is, the nested block *must* be opened *and* closed within the
parent block. In which case it should not be harder to parse than any
other nested construct.

While we're at it, you may not know but there are already a couple
Python packages for building tokenizers/parsers - could it be the case
that you're guilty of ReinventingTheSquaredWheel(tm) ?-)

My 2 cents...


MartinRinehart@gmail.com 12-11-2007 10:31 PM

Re: Block comments
 


Bruno Desthuilliers wrote:
> Is the array of lines the appropriate data structure here ?


I've done tokenizers both as an array of lines and as a long string.
The former has seemed easier when the language treats EOL as a
statement separator.

re not letting literal strings in code terminate blocks, I think its
the tokenizer-writer's job to be nice to the tokenizer users, the
first one of which will be me, and I'll definitely have string
literals that enclose what would otherwise be a block end marker.

> While we're at it, you may not know but there are already a couple
> Python packages for building tokenizers/parsers


The tokenizer in the Python library is pretty close to what I want,
but it returns tuples, where I want an array of Token objects. It also
reads the source a line at a time, which seems a bit out of date.
Maybe two or three decades out of date.

Actually, it takes about a day to write a reasonable tokenizer. (That
is, if you are writing using a language that you know.) Since I know
the problem thoroughly, it seemed like a good starting point for
learning Python.

There's a tokenizer I wrote in java at http://www.MartinRinehart.com/src/la...Tokenizer.html
.. Actually, that's an HTML page written by my "javasrc" (parallel to
Sun's javadoc) based on the Tokenizer's tokenizing of its own source.

Have I got those quotes right?

Bruno Desthuilliers 12-12-2007 12:43 AM

Re: Block comments
 
MartinRinehart@gmail.com a écrit :
(snip about tokenizers - not exactly my domain, sorry)
>
> Have I got those quotes right?


Perfect !-)


All times are GMT. The time now is 07:19 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.