Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Javascript > Remove trailing comments exercise

Reply
Thread Tools

Remove trailing comments exercise

 
 
Csaba Gabor
Guest
Posts: n/a
 
      11-07-2009
On Nov 6, 6:36 pm, Lasse Reichstein Nielsen <(E-Mail Removed)>
wrote:
> Thomas 'PointedEars' Lahn <(E-Mail Removed)> writes:
> > Lasse Reichstein Nielsen wrote:

> var re = /('(?:[^']|\\')*')/g;
> alert(re.exec(code)[0]);
>
> It alerts the string "'abc\\'", i.e., it does end at the first
> "'", even if the quote is escaped.


The above recognizes from one single quote to
the final single quote in the string. One may
just as well write var re = /('.*?')/g

> The reason it does so is that [^'] matches backslash as well, and
> with a higher priority than what comes after, so it matches the
> backslash as well.
>
> The immediate fix of swapping the alternatives:
> var re = /('(?:\\'|[^'])*'/g;


The above recognizes from a single quote to either the next
single quote not preceded by a backslash if such a single
quote exists; else to the last single quote. To observe:

var code = "abc'def\\'ghi'jkl\\\\'mno\\\\'pqr";
var re = /'(?:\\'|[^'])*'/g
alert (code.replace(re, "XXX"));

> and giving \\' priority over [^'], will match "\\'" as a non-string-ender,
> but will also ignore "\\\\'". It's necessary to know whether there is an
> even number of backslashes before the quote in order to know whether it's
> escaped or not. The RegExp below is the simplest one I have found to do that.


Lasse, to me, the RegExp below looks identical to the first
one above. So in the absence of me seeing it, here is a
regular expression that recognizes single quoted strings.
It will match from a single quote to the next single quote
not preceded by an odd number of backslashes.

var re = /'(?:\\.|[^\\'])*'/g

> > /* 'foo \\' */
> > var code = "'foo \\\\' '";

>
> > /* ["'foo \\'", "'foo \\'"] */
> > /('(?:[^']|\\')*')/.exec(code)


> Glad to be of service
> ECMAScript syntax is ... interesting. Context depending lexing combined
> with semicolon-insertion gives ample room to make mistakes
>
> var b=2,g=1;
> var a = 84
> /b/g; // <- it's division


This is highly interesting, where the interpretation of that
final line also depends on what comes before it. For example:

var b=2,g=1;
var a = 84;
/b/g; // <- it's a regular expression

or

whole(truth) /b+c/g; // division
vs.
while(truth) /b+c/g; // RegExp

I wonder about other examples of (non embedded) code being
interpreted differently depending on what precedes it.


Also, while your example of [^] works on my FF1.5, it does
not complile on my IE 6. Ie. adding
var re=/[^]/;
results in an error message from IE.
 
Reply With Quote
 
 
 
 
Csaba Gabor
Guest
Posts: n/a
 
      11-07-2009
On Nov 7, 11:21*am, Csaba Gabor <(E-Mail Removed)> wrote:
> On Nov 6, 6:36 pm, Lasse Reichstein Nielsen <(E-Mail Removed)>
> wrote:
>
> > Thomas 'PointedEars' Lahn <(E-Mail Removed)> writes:
> > > Lasse Reichstein Nielsen wrote:

> > *var re = /('(?:[^']|\\')*')/g;
> > *alert(re.exec(code)[0]);

>
> > It alerts the string *"'abc\\'", i.e., it does end at the first
> > "'", even if the quote is escaped.

>
> The above recognizes from one single quote to
> the final single quote in the string. One may
> just as well write var re = /('.*?')/g


final => next. Sorry about that

If the ? in the RegExp I supplied is omitted, then
it captures till the final single quote
 
Reply With Quote
 
 
 
 
VK
Guest
Posts: n/a
 
      11-07-2009
Thomas 'PointedEars' Lahn wrote:
> It is really merely an issue to recognize and ignore string literals first,
> then to recognize and ignore RegExp initializers outside of them. *My
> replace function already implements the former; adapting it to also take
> care of the latter is left as an exercise to the reader.


Your replace function so far converts a syntactically correct source
into syntactically incorrect one:
/foobar//foobar
comes to
/foobar
which is "unterminated regular expression literal"

P.S. It is a bit of fun to watch people making a robust parser
algorithm for an algorithmically unparseable matter. But keep going, I
have more...

 
Reply With Quote
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      11-07-2009
VK wrote:

> Thomas 'PointedEars' Lahn wrote:
>> It is really merely an issue to recognize and ignore string literals
>> first, then to recognize and ignore RegExp initializers outside of them.
>> My replace function already implements the former; adapting it to also
>> take care of the latter is left as an exercise to the reader.

>
> Your replace function so far converts a syntactically correct source
> into syntactically incorrect one:
> /foobar//foobar
> comes to
> /foobar
> which is "unterminated regular expression literal"


If you had paid attention, you would have known that I am aware of the
RegExp issue.

> P.S. It is a bit of fun to watch people making a robust parser
> algorithm for an algorithmically unparseable matter.


It is not algorithmically unparseable. Otherwise there would be no script
engine that accepts RegExp initializer, would there? The context in which
`/' is not recognized as the start of a RegExp initializer is grammatically
well-defined, and if you had cared to read the Specification you would have
known.

> But keep going, I have more...


You would.


PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm> (404-comp.)
 
Reply With Quote
 
Lasse Reichstein Nielsen
Guest
Posts: n/a
 
      11-08-2009
Csaba Gabor <(E-Mail Removed)> writes:

[correct description of how the regexps work]

>> and giving \\' priority over [^'], will match "\\'" as a non-string-ender,
>> but will also ignore "\\\\'". It's necessary to know whether there is an
>> even number of backslashes before the quote in order to know whether it's
>> escaped or not. The RegExp below is the simplest one I have found to do that.

>
> Lasse, to me, the RegExp below looks identical to the first
> one above. So in the absence of me seeing it, here is a
> regular expression that recognizes single quoted strings.
> It will match from a single quote to the next single quote
> not preceded by an odd number of backslashes.
>
> var re = /'(?:\\.|[^\\'])*'/g


My mistake. The "RegExp below" that I was referring to was one that I
had written in a double-quoted message, but I managed to remove that
quote before posting.

It was indeed equivalent to the one you wrote here (I think it had the
alternative in the opposite order, but that's not important since they
are mutually exclusive.

>> var b=2,g=1;
>> var a = 84
>> /b/g; // <- it's division

>
> This is highly interesting, where the interpretation of that
> final line also depends on what comes before it. For example:
>
> var b=2,g=1;
> var a = 84;
> /b/g; // <- it's a regular expression
>
> or
>
> whole(truth) /b+c/g; // division
> vs.
> while(truth) /b+c/g; // RegExp
>
> I wonder about other examples of (non embedded) code being
> interpreted differently depending on what precedes it.


There are a few:
An object literal, {foo: 42}, is alos a valid statement block
with a labeled expression statement. In an expression context,
it can only be the object literal, in a statement context, it
can only be the statement block, and since expressions can be
statements (ExpressionStatement) there is a rule that says that
an ExpressionStatement cannot begin with "{" (or "function").


> Also, while your example of [^] works on my FF1.5, it does
> not complile on my IE 6. Ie. adding
> var re=/[^]/;
> results in an error message from IE.


Tsk, tsk.

/L
--
Lasse Reichstein Holst Nielsen
'Javascript frameworks is a disruptive technology'

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove only TRAILING whitespace Bob Smyph Ruby 4 10-14-2008 06:56 PM
Remove trailing spaces in ecilpse pd Java 3 12-07-2007 12:32 PM
Remove trailing space in Open Office spreadsheets? Evan Platt Computer Support 1 08-28-2006 09:05 PM
remove trailing whitespace from string Donald Canton C++ 5 02-09-2004 04:39 PM
RegExp for remove all trailing CrLf's? McKirahan Javascript 4 01-30-2004 05:23 AM



Advertisments