Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Javascript > Strange result with Regexp

Reply
Thread Tools

Strange result with Regexp

 
 
howa
Guest
Posts: n/a
 
      04-02-2008
E.g.

var s = "12345d";
document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));


It shows:

s=12345d, xxdx

while I expect

xd

Any suggestions?

Thanks.
 
Reply With Quote
 
 
 
 
Lasse Reichstein Nielsen
Guest
Posts: n/a
 
      04-02-2008
howa <> writes:

> var s = "12345d";
> document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));


> It shows:
>
> s=12345d, xxdx


I would have expected xdx, but your result is equally valid.
The regular expression /[0-9]*/ matches *zero* or more digits.
Change it to /[0-9]+/.

/L
--
Lasse Reichstein Nielsen -
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
 
Reply With Quote
 
 
 
 
pr
Guest
Posts: n/a
 
      04-02-2008
howa wrote:
> E.g.
>
> var s = "12345d";
> document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));
>
>
> It shows:
>
> s=12345d, xxdx
>
> while I expect
>
> xd
>
> Any suggestions?


As Lasse says, '*' matches zero or more. In theory, globally replacing a
zero-length string should be an infinite task. In practice
(fortunately), the regular expression engine avoids consecutive
zero-length matches. Therefore you have one 5-digit match and two
0-digit matches, one each side of the 'd'.

These examples look even odder:

"d".replace(/[0-9]*/g, "x") // xdx
"dddd".replace(/[0-9]*/g, "x") // xdxdxdxdx

To preserve your sanity try to consider '*' as a last resort. And
only use that 'g' flag if you mean it.
 
Reply With Quote
 
Lee
Guest
Posts: n/a
 
      04-02-2008
howa said:
>
>E.g.
>
>var s = "12345d";
>document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));
>
>
>It shows:
>
>s=12345d, xxdx
>
>while I expect
>
>xd
>
>Any suggestions?


You don't really want to be specifying "zero or more",
or even "one or more". Simply replace *each individual*
digit with an "x", allowing the "g" flag to do the work:

replace(/[0-9]/g,"x")


--

 
Reply With Quote
 
Alexey Kulentsov
Guest
Posts: n/a
 
      04-02-2008
howa wrote:
> E.g.
>
> var s = "12345d";
> document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));
>
>
> It shows:
>
> s=12345d, xxdx
>
> while I expect
>
> xd
>
> Any suggestions?


Remove 'g' modifier from regexp and you will get your xd
 
Reply With Quote
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      04-03-2008
pr wrote:
> howa wrote:
>> var s = "12345d";
>> document.write("s="+s+ ", " + s.replace(/[0-9]*/g,'x'));
>>
>> It shows:
>>
>> s=12345d, xxdx
>>
>> while I expect
>>
>> xd

>
> [...] In theory, globally replacing a zero-length string should be an
> infinite task. In practice (fortunately), the regular expression engine
> avoids consecutive zero-length matches. Therefore you have one 5-digit
> match and two 0-digit matches, one each side of the 'd'.


Not at all. In theory, there is an ε (epsilon) production; please read
about Regular Grammars:

http://en.wikipedia.org/wiki/Formal_...gular_grammars

In practice, Regular Expressions match *non-overlapping* occurrences of the
pattern in the string which means that even with global matching no position
is visited twice by the matcher; please read ECMA-262 Ed. 3 Final, section
15.5.4.11:

http://www.ecmascript.org/docs.php

Here is what happens, in a nutshell (I used `^' to indicate the next
possible match, and `ε' for the empty word/string to be matched):

0. Input string: "12345d"
Regular Expression: /[0-9]*/g --> lastIndex=0
Replacement string: "x"

1. Find matches for the Regular Expression.

position 0 1 2 3 4 5
ε1ε2ε3ε4ε5εdε
^ ^ ^ ^ ^
(/[0-9]*/, lastIndex=0) --> ("12345", index=0, lastIndex=5)

Greedy matching, so the longest match wins.
The global flag is set, continue.

2. Find more matches for the Regular Expression.

position 0 1 2 3 4 5
ε1ε2ε3ε4ε5εdε
^
(/[0-9]*/, lastIndex=5) --> (ε, index=5, lastIndex=5)

The longest and only possible match that remains is the empty string;
next possible match after position 4.
The global flag is set, continue.

3. Find more matches for the Regular Expression.

position 0 1 2 3 4 5 6
ε1ε2ε3ε4ε5εdε
^
(/[0-9]*/, lastIndex=5) --> (ε, index=6, lastIndex=6)

The longest and only possible match that remains is the empty string;
next possible match after position 5.
The global flag is set, continue.

4. Find more matches for the Regular Expression.

position 0 1 2 3 4 5 6
ε1ε2ε3ε4ε5εdε
^
(/[0-9]*/, lastIndex=6) --> (null, index=6, lastIndex=0)

End of string, no further matches possible.

5. Found matches:

("12345", index=0, lastIndex=5),
(ε, index=5, lastIndex=6),
(ε, index=6, lastIndex=6),

6. Replace all matches with the replacement string each.

position 0 1 2 3 4 5 6
ε1ε2ε3ε4ε5εdε

Result: x xdx

7. Result: "xxdx"

You can confirm this when evaluating the return value of
"12345d".match(/[0-9]*/g) -- as defined in the Specification -- which is
["12345", "", ""] whereas the matches "" can be understood as those
literally matching ε, the empty word/string.


HTH

PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee
 
Reply With Quote
 
pr
Guest
Posts: n/a
 
      04-04-2008
Thomas 'PointedEars' Lahn wrote:
> pr wrote:
>> [...] In theory, globally replacing a zero-length string should be an
>> infinite task. In practice (fortunately), the regular expression engine
>> avoids consecutive zero-length matches. Therefore you have one 5-digit
>> match and two 0-digit matches, one each side of the 'd'.

>
> Not at all. In theory, there is an ε (epsilon) production; please read
> about Regular Grammars:
>
> http://en.wikipedia.org/wiki/Formal_...gular_grammars


I didn't know about those.

>
> In practice, Regular Expressions match *non-overlapping* occurrences of the
> pattern in the string which means that even with global matching no position
> is visited twice by the matcher; please read ECMA-262 Ed. 3 Final, section
> 15.5.4.11:


Are you going to tell me that zero-length strings can overlap? Is that
another mathematics thing?

>
> http://www.ecmascript.org/docs.php
>


15.5.4.10:

| If regexp.global is true: Set the regexp.lastIndex property to 0 and
| invoke RegExp.prototype.exec repeatedly until there is no match. If
| there is a match with an empty string (in other words, if the value
| of regexp.lastIndex is left unchanged), increment regexp.lastIndex
| by 1.

and 15.10.2.5

| Step 1 of the RepeatMatcher's closure d states that, once the
| minimum number of repetitions has been satisfied, any more
| expansions of Atom that match the empty string are not considered
| for further repetitions. This prevents the regular expression engine
| from falling into an infinite loop on patterns such
| as:
|
| /(a*)*/.exec("b")

> Here is what happens, in a nutshell (I used `^' to indicate the next
> possible match, and `ε' for the empty word/string to be matched):
>

[...]

Your explanation is more detailed but I don't think it says anything
mine didn't. Seems one of us misread.
>
> You can confirm this when evaluating the return value of
> "12345d".match(/[0-9]*/g) -- as defined in the Specification -- which is
> ["12345", "", ""] whereas the matches "" can be understood as those
> literally matching ε, the empty word/string.


Exactly; 'one 5-digit match and two 0-digit matches', since the
expression matched zero or more digits. Or, to put it another way:

(function () {
var s = "12345d";
var re = /[0-9]*/g, results;
while ((results = re.exec(s)) &&
confirm(["'" + results[0] + "'", results.index,
re.lastIndex].join(" | ") + "\n")) {
if (results[0].length == 0) {
re.lastIndex++;
}
}
})();
 
Reply With Quote
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      04-04-2008
pr wrote:
> Thomas 'PointedEars' Lahn wrote:
>> pr wrote:
>>> [...] In theory, globally replacing a zero-length string should be an
>>> infinite task. In practice (fortunately), the regular expression engine
>>> avoids consecutive zero-length matches. Therefore you have one 5-digit
>>> match and two 0-digit matches, one each side of the 'd'.

>> Not at all. [...]
>> In practice, Regular Expressions match *non-overlapping* occurrences of the
>> pattern in the string which means that even with global matching no position
>> is visited twice by the matcher; please read ECMA-262 Ed. 3 Final, section
>> 15.5.4.11:

>
> Are you going to tell me that zero-length strings can overlap? Is that
> another mathematics thing?


I was talking about patterns in the string, about not strings. IOW,

(ab|abc)

matches only "ab" in "abcd", not also "abc", because these two patterns in
the string overlap. This is accomplished quite simply by continue matching
at the endIndex of the previous match, and not at its index. Which is the
reason why one observes the result of "xxdx".

>> http://www.ecmascript.org/docs.php

>
> 15.5.4.10:
>
> | If regexp.global is true: Set the regexp.lastIndex property to 0 and
> | invoke RegExp.prototype.exec repeatedly until there is no match. If
> | there is a match with an empty string (in other words, if the value
> | of regexp.lastIndex is left unchanged), increment regexp.lastIndex
> | by 1.
>
> and 15.10.2.5
>
> | Step 1 of the RepeatMatcher's closure d states that, once the
> | minimum number of repetitions has been satisfied, any more
> | expansions of Atom that match the empty string are not considered
> | for further repetitions. This prevents the regular expression engine
> | from falling into an infinite loop on patterns such
> | as:
> |
> | /(a*)*/.exec("b")


What you said is quite different from that. It has not anything to do with
"consecutive zero-length matches". As I have showed, there are consecutive
zero-length matches that are considered.

In plain English, the above paragraph merely says that once the matcher has
tried to match the empty word (length=0), it stops and continues at the
position of the next occurrence of the pattern in the string, as I have showed.

>> Here is what happens, in a nutshell (I used `^' to indicate the next
>> possible match, and `ε' for the empty word/string to be matched):

> [...]
>
> Your explanation is more detailed but I don't think it says anything
> mine didn't.


Yes, it does.

> Seems one of us misread.


Yes, you did.


PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
i = 10; result = ++i - --i; How result become ZERO Lakshmi Sreekanth C Programming 52 09-23-2010 07:41 AM
Is the result of valid dynamic cast always equal to the result ofcorrespondent static cast? Pavel C++ 7 09-18-2010 11:35 PM
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
simulation result is correct but synthesis result is not correct J.Ram VHDL 7 12-03-2008 01:26 PM
1. Ruby result: 101 seconds , 2. Java result:9.8 seconds, 3. Perl result:62 seconds Michael Tan Ruby 32 07-21-2005 03:23 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57