Velocity Reviews > Search in sorting array

# Search in sorting array

Asen Bozhilov
Guest
Posts: n/a

 03-25-2010
I've to search in sorting array. Possible input data for search is:

[0, 999]

function EGN(egn) {
this.value = egn.split('');

this._year = +egn.slice(0, 2);
this._month = +egn.slice(2, 4);
this._date = +egn.slice(4, 6);
this._order = +egn.slice(6, 9);
}

EGN.prototype = {
constructor : EGN,
regionIds : [-1, 43, 93, 139, 169, 183, 217, 233, 281, 301, 319,
341, 377, 395, 435, 501, 527, 555, 575, 601, 623, 721, 751, 789, 821,
843, 871, 903, 925, 999],

getRegion : function() {
var start = 0,
end = this.regionIds.length - 1,
middle,
lower_b,
upper_b;

while(start <= end)
{
middle = Math.floor((end + start) / 2);
lower_b = this.regionIds[middle];
upper_b = this.regionIds[middle + 1];
if (this._order > upper_b)
{
start = middle + 1;
}
else if (this._order < lower_b)
{
end = middle - 1;
}
else {
return middle;
}
}
return -1;
}
};

var my = new EGN('8509306247');

The string parameter of `EGN' constructor is identifier in my country.
Each person has identifier like that. Ten digit of identifier give

1, 2 => Are year of born
3, 4 => Are month of born. If month > 20 the person is born before
1900 year. If menth > 40 the person is born after 2000 year.
5, 6 => Are date of born.
7, 8, 9 => Are order and region, in which person been born. If is an
even number, the person is man. Otherwise is woman. Each region has
range for people in that region. I've to get lower boundary of region,
because I need to print region name of the person and order for
region.
10 => Is only for controlling valid identifier, and calculate on other
9. Algorithm is specified by administration in my country.

At the moment I use binary search. `regionIds' contains ranges of
every region. I need to get lower boundary of region.

If there are any optimization of my code, I'll be glad to see. Any
suggestions and corrections are welcome.
Thanks.

Jorge
Guest
Posts: n/a

 03-25-2010
On Mar 25, 8:36*am, Asen Bozhilov <(E-Mail Removed)> wrote:
>
> If there are any optimization of my code, I'll be glad to see. Any
> suggestions and corrections are welcome.

middle = Math.floor((end + start) / 2);
--> middle= ((end + start) / 2) | 0;
--
Jorge.

Richard Cornford
Guest
Posts: n/a

 03-25-2010
On Mar 25, 10:06 am, Jorge wrote:
<snip>
> middle = Math.floor((end + start) / 2);
> --> middle= ((end + start) / 2) | 0;

Then why not the simpler:-

middle= ((end + start) >>> 1);

-?

Richard.

Jorge
Guest
Posts: n/a

 03-25-2010
On Mar 25, 12:27*pm, Richard Cornford <(E-Mail Removed)>
wrote:
> On Mar 25, 10:06 am, Jorge wrote:
> <snip>
>
> > middle = Math.floor((end + start) / 2);
> > --> middle= ((end + start) / 2) | 0;

>
> Then why not the simpler:-
>
> middle= ((end + start) >>> 1);

Touché
--
Jorge.

Asen Bozhilov
Guest
Posts: n/a

 03-25-2010
Richard Cornford wrote:

> Then why not the simpler:-
>
> middle= ((end + start) >>> 1);

Yes, in my case it's possible to use that. Thanks. But that approach
is harmful for biggest arrays because `>>>` convert operands
`ToUint32`, which mean can truncate bits and return unexpected
result.

var len = Math.pow(2, 32) - 1;
window.alert((len + 1) >>> 1); //0
window.alert(Math.floor((len + 1) / 2)); //2147483648

Antony Scriven
Guest
Posts: n/a

 03-25-2010
On Mar 25, 7:36am, Asen Bozhilov wrote:

> I've to search in sorting array [regionIds]. Possible
> input data for search is:
>
> [0, 999]
>
> EGN.prototype = {
> constructor : EGN,
> regionIds : [-1, 43, 93, 139, 169, 183, 217, 233, 281, 301, 319,
> 341, 377, 395, 435, 501, 527, 555, 575, 601, 623, 721, 751, 789,

821,
> 843, 871, 903, 925, 999],
>
> [...]
>
> };
>
> [...]
>
> At the moment I use binary search. `regionIds' contains
> ranges of every region. I need to get lower boundary of
> region.
>
> If there are any optimization of my code, I'll be glad to
> see. Any suggestions and corrections are welcome.
> Thanks.

It depends what you mean by optimization. If you mean code
simplicity then I'd just do a simple linear scan of the
array. It would only be three or four lines of code.

If you mean speed, I'd also try a linear scan and do some
profiling. Assuming an even distribution and given the fact
that regionIds is not a large array, I wouldn't be surprised
if a linear scan were of a comparable speed to the binary
search on average. If some inputs occur more frequently than
others you could order the regionIds array appropriately.
Then I'd expect a linear scan to be significantly faster on
average.

Another option is to populate regionIds with an entry for
every possible input. Then you'd just have a simple lookup
operation. The trade-off is maintainability, though you
could make generating the array part of a build script. --Antony

Richard Cornford
Guest
Posts: n/a

 03-25-2010
On Mar 25, 12:37 pm, Asen Bozhilov wrote:
> Richard Cornford wrote:
>> Then why not the simpler:-

>
>> middle= ((end + start) >>> 1);

>
> Yes, in my case it's possible to use that. Thanks. But that
> approach is harmful for biggest arrays because `>>>` convert
> operands `ToUint32`, which mean can truncate bits and return
> unexpected result.
>
> var len = Math.pow(2, 32) - 1;
> window.alert((len + 1) >>> 1); //0
> window.alert(Math.floor((len + 1) / 2)); //2147483648

Bigger arrays are fine, by definition. You are working on array
indexes, which are constrained by: a property name P is an 'array
index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
3rd Ed. Section 15.4). Thus if you initially set - start - to zero and
- end - to (array.length - 1), and then all values subsequently
assigned to - start - and - end - are integers within that range, as
they should be, then the ToUint32 operation implied by the >>>
operation is safe.

(The ToInt32 operation implied by - x|0 - is not so theoretically
safe, but should be considered in relation to how big, in memory
consumption terms, an array with two to the power of thirty one
elements would be (particularly on an OS with a 32 bit memory map).)

Richard.

Asen Bozhilov
Guest
Posts: n/a

 03-25-2010
Richard Cornford wrote:

> Bigger arrays are fine, by definition. You are working on array
> indexes, which are constrained by: a property name P is an 'array
> index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
> 3rd Ed. Section 15.4).

I agree with you and documentation in that point. But in binary search
`>>>` it can give unexpected result when you computing `middle' of the
part in which you looking. If you searching in interval:

(arr.length / 2, arr.length)

How can you compute next middle value, for array which length is equal
to (2 ^ 32) - 1?

var arr = new Array(Math.pow(2, 32) - 1),
end = arr.length - 1,
start = end >>> 1;

window.alert((end + start) >>> 1); //1073741822
window.alert(Math.floor((end + start) / 2)); //3221225470

Jorge
Guest
Posts: n/a

 03-25-2010
On Mar 25, 2:39*pm, Asen Bozhilov <(E-Mail Removed)> wrote:
> Richard Cornford wrote:
> > Bigger arrays are fine, by definition. You are working on array
> > indexes, which are constrained by: a property name P is an 'array
> > index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
> > 3rd Ed. Section 15.4).

>
> I agree with you and documentation in that point. But in binary search
> `>>>` it can give unexpected result when you computing `middle' of the
> part in which you looking. If you searching in interval:
>
> (arr.length / 2, arr.length)
>
> How can you compute next middle value, for array which length is equal
> to (2 ^ 32) - 1?
>
> var arr = new Array(Math.pow(2, 32) - 1),
> * * end = arr.length - 1,
> * * start = end >>> 1;
>
> window.alert((end + start) >>> 1); //1073741822
> window.alert(Math.floor((end + start) / 2)); //3221225470

((end + start) / 2) >>> 0
--> 3221225470

--
Jorge.

Jorge
Guest
Posts: n/a

 03-25-2010
On Mar 25, 2:39*pm, Asen Bozhilov <(E-Mail Removed)> wrote:
> Richard Cornford wrote:
> > Bigger arrays are fine, by definition. You are working on array
> > indexes, which are constrained by: a property name P is an 'array
> > index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
> > 3rd Ed. Section 15.4).

>
> I agree with you and documentation in that point. But in binary search
> `>>>` it can give unexpected result when you computing `middle' of the
> part in which you looking. If you searching in interval:
>
> (arr.length / 2, arr.length)
>
> How can you compute next middle value, for array which length is equal
> to (2 ^ 32) - 1?
>
> var arr = new Array(Math.pow(2, 32) - 1),
> * * end = arr.length - 1,
> * * start = end >>> 1;
>
> window.alert((end + start) >>> 1); //1073741822
> window.alert(Math.floor((end + start) / 2)); //3221225470

start= end= parseInt("ffffffff", 16);
--> 4294967295
((start + end) / 2) >>> 0
--> 4294967295 (toUint32, not truncated, 0xffffffff)

OTOH:

(start + end) >>> 1
--> 2147483647 (toUint32, truncated, 0x7fffffff)

((start + end) / 2) | 0
--> -1 (toInt32, not truncated, 0xffffffff)

--
Jorge.