Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Javascript (http://www.velocityreviews.com/forums/f68-javascript.html)
-   -   What's wrong with this regexp???? (http://www.velocityreviews.com/forums/t878073-whats-wrong-with-this-regexp.html)

Ronald Fischer 06-25-2004 08:20 AM

What's wrong with this regexp????
 
I have a server-side JavaScript function returning a string.
I would like to test wheather or not the string contains the following pattern:

- an equal sign,
- followed by one or more characters which are neither an ampersand nor an
equal sign,
- followed by another equal sign.

That is: A return value of that function of
X=ABCY=DEF should match, but
X=ABC&Y=DEF should not match

This is what I came up with:

if(((/=[^&=]+=/).test(get_query_string())) != null)
{
// matches
}
else
{
// does not match
}

The problem is that the function matches too much. For example, if
get_query_string() returns "LANG=EN", it matches too, although the
string contains only a single equal sign!

Any idea of what could be wrong here?

Ronald

Grant Wagner 06-25-2004 03:28 PM

Re: What's wrong with this regexp????
 
Ronald Fischer wrote:

> I have a server-side JavaScript function returning a string.
> I would like to test wheather or not the string contains the following pattern:
>
> - an equal sign,
> - followed by one or more characters which are neither an ampersand nor an
> equal sign,
> - followed by another equal sign.
>
> That is: A return value of that function of
> X=ABCY=DEF should match, but
> X=ABC&Y=DEF should not match
>
> This is what I came up with:
>
> if(((/=[^&=]+=/).test(get_query_string())) != null)
> {
> // matches
> }
> else
> {
> // does not match
> }
>
> The problem is that the function matches too much. For example, if
> get_query_string() returns "LANG=EN", it matches too, although the
> string contains only a single equal sign!
>
> Any idea of what could be wrong here?
>
> Ronald


I don't know if there's anything wrong with the regex, I haven't gotten that far.
The reason it's matching everything is because RegExp.test() returns a boolean (two
possible values, true or false). It can _never_ return null, so the "else" code
block is _never_ executed, even when test() returns false. You also don't need so
many brackets around stuff.

Change: if(((/=[^&=]+=/).test(get_query_string())) != null)

to: if (/=[^&=]+=/.test(get_query_string()))

....

Now I've had a chance to look at the regex, and it seems right given the criteria
you've specified.

--
| Grant Wagner <gwagner@agricoreunited.com>

* Client-side Javascript and Netscape 4 DOM Reference available at:
*
http://devedge.netscape.com/library/...ce/frames.html

* Internet Explorer DOM Reference available at:
*
http://msdn.microsoft.com/workshop/a...ence_entry.asp

* Netscape 6/7 DOM Reference available at:
* http://www.mozilla.org/docs/dom/domref/
* Tips for upgrading JavaScript for Netscape 7 / Mozilla
* http://www.mozilla.org/docs/web-deve...upgrade_2.html



Thomas 'PointedEars' Lahn 06-25-2004 05:46 PM

Re: What's wrong with this regexp????
 
Ronald Fischer wrote:
> I have a server-side JavaScript function returning a string.
> I would like to test wheather or not the string contains the
> following pattern:
>
> - an equal sign,
> - followed by one or more characters which are neither an
> ampersand nor an equal sign,
> - followed by another equal sign.
>
> That is: A return value of that function of
> X=ABCY=DEF should match, but
> X=ABC&Y=DEF should not match
>
> This is what I came up with:
>
> if(((/=[^&=]+=/).test(get_query_string())) != null)


For the sake of legibility, omit some parantheses, then read the
documentation of the test() method. It returns a *boolean* value
(`true' or `false') which is always not equal to `null' which is
why your test fails. You are looking for

if (/=[^&=]+=/.test(get_query_string()))

However, there are better ways to parse the query part of an URI.

> [...]
> The problem is that the function matches too much.


No, it does not.


PointedEars

Dr John Stockton 06-25-2004 09:44 PM

Re: What's wrong with this regexp????
 
JRS: In article <40DC447C.9BB10B30@agricoreunited.com>, seen in
news:comp.lang.javascript, Grant Wagner <gwagner@agricoreunited.com>
posted at Fri, 25 Jun 2004 15:28:18 :
>Ronald Fischer wrote:
>
>> I have a server-side JavaScript function returning a string.
>> I would like to test wheather or not the string contains the following

>pattern:


Does "contains" mean "consists of only" or "has somewhere in itself" ?
If the former, change the RegExp from /=[^&=]+=/ to /^=[^&=]+=$/

But apparently not.

>> That is: A return value of that function of
>> X=ABCY=DEF should match, but
>> X=ABC&Y=DEF should not match
>>
>> This is what I came up with:
>>
>> if(((/=[^&=]+=/).test(get_query_string())) != null)

>. ...


Better to write just

OK = /=[^&=]+=/.test("test string")

for initial test, and

OK = /=[^&=]+=/.test(get_query_string())
if (OK) { ...

for actual use; it seems clearer.

>Now I've had a chance to look at the regex, and it seems right given the
>criteria
>you've specified.


OK by <URL:http://www.merlyn.demon.co.uk/js-quick.htm>
OK by <URL:http://www.merlyn.demon.co.uk/js-valid.htm#RT>

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4
<URL:http://jibbering.com/faq/> JL / RC : FAQ for news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.

Ronald Fischer 07-07-2004 08:27 AM

Re: What's wrong with this regexp????
 
Grant Wagner <gwagner@agricoreunited.com> wrote in message news:<40DC447C.9BB10B30@agricoreunited.com>...
> Ronald Fischer wrote:
> > I would like to test wheather or not the string contains the following pattern:
> >
> > - an equal sign,
> > - followed by one or more characters which are neither an ampersand nor an
> > equal sign,
> > - followed by another equal sign.
> >
> > That is: A return value of that function of
> > X=ABCY=DEF should match, but
> > X=ABC&Y=DEF should not match
> >
> > This is what I came up with:
> >
> > if(((/=[^&=]+=/).test(get_query_string())) != null)
> > {
> > // matches
> > }
> > else
> > {
> > // does not match
> > }

> The reason it's matching everything is because RegExp.test() returns a boolean (two
> possible values, true or false). It can _never_ return null, so the "else" code
> block is _never_ executed, even when test() returns false.


OK, got that.

> Now I've had a chance to look at the regex, and it seems right given the criteria
> you've specified.


Interestingly, it seems to be NEARLY right. The problem is that we need
to catch strings where some of the characters are not in the 7-Bit ASCII
character set. One example which occurs in our case is the character
with code 0xA4 (represented on our system as the so-called "international
currency symbol"). It turns out that this character does NOT match the
pattern [^&=]. Obviously, the JavaScript regexp pattern engine bails out
for those characters (maybe because of the settings of the current locale).

I wonder weather there is a portable way to catch such cases too with
a regexp.... I think that, as a temporary solution, I will have to
loop throught the string first and replace every occurence of the
offending character 0xA4 by something more harmless (fortunately, this
"loss of information" does not have any impact in my case, but it can't
be regarded as a general solution, though).

Ronald

Thomas 'PointedEars' Lahn 07-07-2004 05:09 PM

Re: What's wrong with this regexp????
 
Ronald Fischer wrote:
> [...] The problem is that we need
> to catch strings where some of the characters are not in the 7-Bit ASCII
> character set. One example which occurs in our case is the character
> with code 0xA4 (represented on our system as the so-called "international
> currency symbol"). It turns out that this character does NOT match the
> pattern [^&=].


It matches here. alert(/[^&=]/.test("\xA4")) yields `true' in
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8a2) Gecko/20040630
Firefox/0.8.0+.

> Obviously, the JavaScript regexp pattern engine bails out
> for those characters (maybe because of the settings of the
> current locale).


Possibly.

J(ava)Script strings are Unicode strings (more exact: UTF-16 strings,
as [W3C] DOMStrings are), but only from JavaScript version 1.3 on and
AIUI from JScript version 5.5 on. The Unicode character \u00A4 is the
same as \xA4 in ISO-8859-1 (Latin-1) because Unicode shares code points
\xA0 (\u00A0) to \xFF (\u00FF) with that encoding. However, the two
characters should differ if your locale is not UTF-xx and not
ISO-8859-1. For example, \xA4 should equal \u20AC (the Euro sign) in
ISO-8859-15 (Latin-9).

Interestingly, I have LC_ALL=de_DE@euro here, yet \xA4 and \u20AC differ
in my UA which is said to interpret JavaScript 1.5. In that language,
AFAIS in contrast to ECMAScript 3, it is specified that \xA4 means code
point 0xA4 in ISO-8859-1 which is not equal to \u20AC (so my Mozilla is
correct here, however the implementation is IMHO not standards
compliant in this regard as it is not locale-aware). According to the
JScript Reference, \xhh refers to "ASCII characters" there which would
mean only \x00 to \x7F to be valid escape sequences. That would remove
the locale dependency but I am afraid that they meant "Extended ASCII
characters" rather than "US-ASCII characters", which would re-introduce it.

> I wonder weather there is a portable way to catch such cases too
> with a regexp....


You can use alternation to include characters you require to be matched:

/=([^&=]|\xA4)+=/.test(...)

Use character classes if there is more than one character, e.g.:

/=([^&=]|[\xA0-\xFF])+=/.test(...)


PointedEars


All times are GMT. The time now is 05:31 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.