Thomas 'PointedEars' Lahn <> writes:
> Lasse Reichstein Nielsen wrote:
> But his method consumes much more memory and computing time than
> mine, no matter if the strings are structured or not. IOW: Compared
> to my method, his is highly inefficient in *every* case.
Have you tested it? It consumes much less computing time, and the
memory is constant. I think you underestimate the complexity of
interpreting a regular expression (or more likey: running the finite
state automaton it has been compiled into) on a string.
I created a test that made an array of 100000 random umbers in the
range 80000-99999. Then it tested both methods against that table, using
result1[i] = !!table[data[i]];
and
result2[i] = re.test(data[i]);
(with a base check of
result0[i] = data[i],false;
to find the overhead of the other parts of the code not used in the
actual test)
The entire test is included below.
The results were (in milliseconds):
base table regexp
IE 6: 1212 1733 23373
Opera 7.23: 601 681 2944
Moz FB 0.7 471 540 2304
So, your method is, by far, less efficient than a table lookup, and
even in IE, which (IIRC) uses a linear lookup for object properties.
The only case where the table lookup loses is in size. It can be made
better by building the table dynamically:
var numbers = [101,102,103,105,106,107,108,109,110,111,116,117,11 8,120,
121,130,140,150,160,190,199,401,402,403,405,406,40 7,408,409,410,412];
var table = {};
for (var i in numbers) {table[93000+numbers[i]]=true;}
Still larger than a regular expression, but not significantly.
>> Regexps take more work to make, and are harder to read.
>
> Not generally, no.
Definitly yes.
I am very familiar with regular expressions, but I still have to think
to read and understand one. The table is obvious. And while the table
might take more space (that is a relevant parameter), it is easier to
write. Given the list of numbers, it won't take long in Emacs to turn
it into a table.
>> And *much* harder to extend with new numbers, if it becomes necessary
>
> No, see below.
To extend the regular expression, you have to either find the place in it
that requires changing, or rebuild it from scratch. In a (sorted) table,
you just have to find the correct place and add the line (or add it anywhere
if you don't sort the table).
It might not be a big difference, but it is definitly there.
Regular expressions requires thought. The table can be automated.
>> (It probably does, but comparing RegExps to string is ExpSpace complete
> ^^^^^^^^^^^^^^^^
> Define that.
It's a complexity class.
The *genereal* problem of, given a regular expression and another
efficient description of a language (where language := set of strings
not necessarily finite), decide whether the regular expression
recognizes exactly the strings of the language, can (worst case)
require memory space that is exponential in the size of the regular
expression. I.e., it's bloody slow.
As a comparison, factorizing (large) numbers only requires polynomial
space and exponential time, and it's considered too inefficient to
use in practice.
> Not at all. It is primarily a matter of structured building of the
> RegExp, finding similarities first.
It takes thought and familiarity with regular expressions. You can do
it. I can do it. Many other people here can too, but there are lots of
people writing Javascript for web pages that considers regular expressions
black magic, and just uses what they are given. If one of them is going
to maintain the page with your regular expression, he'll be back here
to ask for help in changing it when the numbers change.
> You will not tell me that the above was hard work, will you?
Hard, no. Work, yes. Building the table was *no* work at all.
> Because the RegExp was built *this* way, it is easy as well to find out
> the strings it will match,
This regular expression is also special in that it only recognizes a
finite number of strings. That makes it easier to handle than ones
with "*" or "+" in them. So, the general hardness of the problem
doesn't necessarily apply to this case.
> We have only five-digit numbers with few linear exceptions here, one
> should manage it to see that the above RegExp matches without writing
> the matches down, especially if one has built the RegExp by themselves
> as described above.
Yes. It's (fairly) easy.
> So new numbers are not be a problem at all. If in doubt, one can simply
> add another alternative: If 93429 should be forbidden, too, the RegExp
> can be simply changed to
>
> /^(93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02])|93429)$/
> ^ ^^^^^^^
>
> which, of course, could (later) be optimized to
>
> /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|4(0[1-35-9]|1[02]|29))$/
Yes. It's a relatively simple case.
But regualr expressions are not as obvious to a lot of other people.
The test:
---
//<script>
function test(){
var table = {
93101 : true, 93102 : true, 93103 : true, 93105 : true, 93106 : true,
93107 : true, 93108 : true, 93109 : true, 93110 : true, 93111 : true,
93116 : true, 93117 : true, 93118 : true, 93120 : true, 93121 : true,
93130 : true, 93140 : true, 93150 : true, 93160 : true, 93190 : true,
93199 : true, 93401 : true, 93402 : true, 93403 : true, 93405 : true,
93406 : true, 93407 : true, 93408 : true, 93409 : true, 93410 : true,
93412 : true
};
var re = /^93(1(0[1-35-9]|1[016-8]|2[01]|[3-69]0|99)|40[1-35-9]|41[02])$/;
var testsize = 100000;
var testdata = [];
for (var i = 0;i<testsize;i++) {
testdata[i]=Math.floor(Math.random()*20000+80000);
}
var result0 = new Array(testsize);
var d1 = new Date();
for(var i =0;i<testsize;i++) {
result0[i] = testdata[i],false;
}
var d2 = new Date();
var timebase = d2-d1;
var result1 = new Array(testsize);
var d1 = new Date();
for(var i =0;i<testsize;i++) {
result1[i] = !!table[testdata[i]];
}
var d2 = new Date();
var timetable = d2-d1;
var result2 = new Array(testsize);
var d1 = new Date();
for(var i =0;i<testsize;i++) {
result2[i] = re.test(testdata[i]);
}
var d2 = new Date();
var timere = d2-d1;
alert([timebase,timetable,timere]);
}
test();
//</script>
---
/L
--
Lasse Reichstein Nielsen -
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'