Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > What is the best way to pull out a range of values?

Reply
Thread Tools

What is the best way to pull out a range of values?

 
 
Alf McLaughlin
Guest
Posts: n/a
 
      03-16-2006
Hello all-
It seems I am running across the following problem a lot recently.
Let's say I have a large collection of values and they are in an array
(perhaps there are 10,000 of them). I would like to pull out all
values between $x and $y without having to loop through the entire
array to figure out where the values I'm interested in are locating.
Is there a simple way to do this in Perl? For example, in SQL I would
write something like this:

SELECT position FROM my_table WHERE position BETWEEN 14000 and 1479;

Perhaps the answer involves storing the data in a different structure
(I can picture this working very quickly if the values I'm interested
in were keys in a hash, but I still don't know how to do it).

Thanks,
Alf

 
Reply With Quote
 
 
 
 
Paul Lalli
Guest
Posts: n/a
 
      03-16-2006
Alf McLaughlin wrote:
> Hello all-
> It seems I am running across the following problem a lot recently.
> Let's say I have a large collection of values and they are in an array
> (perhaps there are 10,000 of them). I would like to pull out all
> values between $x and $y without having to loop through the entire
> array to figure out where the values I'm interested in are locating.
> Is there a simple way to do this in Perl?


I see two possible solutions, other than the explicit loop you said you
want to avoid:

1) sort them all, and then find the ones you want:
my @sorted = sort { $a <=> $b } @values;
my @selected;
my $i = 0;
$i++ while $selected[$i] < $x;
push $selected[$i] until $selected[$i++] > $y;


2) Use grep, which really is the same as your original solution, just
hiding the loop:
my @selected = grep { $_ >= $x and $_ <= $y } @values;

I'm not at all convinced the first method is in any way "better" than
the linear search (you can do some Benchmarks if you're curious.).

> Perhaps the answer involves storing the data in a different structure
> (I can picture this working very quickly if the values I'm interested
> in were keys in a hash, but I still don't know how to do it).


I can't imagine how you would build such a hash without doing a linear
search through the values to begin with, which negates any possible
benefit the hash look up would have.

Paul Lalli

 
Reply With Quote
 
 
 
 
DJ Stunks
Guest
Posts: n/a
 
      03-16-2006
Alf McLaughlin wrote:
> Hello all-
> It seems I am running across the following problem a lot recently.
> Let's say I have a large collection of values and they are in an array
> (perhaps there are 10,000 of them). I would like to pull out all
> values between $x and $y without having to loop through the entire
> array to figure out where the values I'm interested in are locating.
> Is there a simple way to do this in Perl? For example, in SQL I would
> write something like this:
>
> SELECT position FROM my_table WHERE position BETWEEN 14000 and 1479;


I'm sure a computer scientist out there will have a faster way, but the
simplest and easiest comprehensible way is:

my @selected_nums = grep { $x <= $_ and $_ < $y } @ten_thousand_nums

sorting would be necessary if you performed some kind of
chop-search-ish routine to identify the indexes associated with the
start and end for @select_nums, then just slice them out of
@ten_thousand_nums, but like I said - I'll leave that for the computer
scientists... the time to sort the big array would also have to be
taken into account then since grep could operate on unsorted data.

> Perhaps the answer involves storing the data in a different structure
> (I can picture this working very quickly if the values I'm interested
> in were keys in a hash, but I still don't know how to do it).


I don't see how this would be any faster in a hash, but I'm all ears...

-jp

 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      03-16-2006
"Alf McLaughlin" <(E-Mail Removed)> wrote:
> Hello all-
> It seems I am running across the following problem a lot recently.
> Let's say I have a large collection of values and they are in an array
> (perhaps there are 10,000 of them). I would like to pull out all
> values between $x and $y without having to loop through the entire
> array to figure out where the values I'm interested in are locating.


Why do you want to avoid looping through the entire array? Because it is
too slow? Or because it takes too much code to implement?

> Is there a simple way to do this in Perl? For example, in SQL I would
> write something like this:
>
> SELECT position FROM my_table WHERE position BETWEEN 14000 and 1479;


How do you know the DBMS isn't just looping through the whole table behind
the scenes? Or is that the point, that you want it to happen behind
the scenes?

my @result = grep $_>=14000 && $_<=14790, @my_table;

or maybe:

sub between(\@$$) { grep $_>=$_[1] && $_<=$_[2], @{$_[0]} };

>
> Perhaps the answer involves storing the data in a different structure
> (I can picture this working very quickly if the values I'm interested
> in were keys in a hash, but I still don't know how to do it).


I don't see how you could make it happen very fast (i.e. substantially
faster than the ordinary array scan, which is already quite fast for an
array as small as 10_000) just by simply using a regular hash.

If the array is sorted, you could do a binary search for the first element
in the range, then linear search from their to the last.

You could store the numbers in range-bins a hash (or an array) and then
limit your inspection to only the necessary bins.

Both of these require special consideration at the time you put things
into the list, as well as when you search it.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      03-16-2006
>>>>> "DS" == DJ Stunks <(E-Mail Removed)> writes:

DS> Alf McLaughlin wrote:
>> Hello all-
>> It seems I am running across the following problem a lot recently.
>> Let's say I have a large collection of values and they are in an array
>> (perhaps there are 10,000 of them). I would like to pull out all
>> values between $x and $y without having to loop through the entire
>> array to figure out where the values I'm interested in are locating.
>> Is there a simple way to do this in Perl? For example, in SQL I would
>> write something like this:
>>
>> SELECT position FROM my_table WHERE position BETWEEN 14000 and 1479;


DS> I'm sure a computer scientist out there will have a faster way, but the
DS> simplest and easiest comprehensible way is:

DS> my @selected_nums = grep { $x <= $_ and $_ < $y } @ten_thousand_nums

DS> sorting would be necessary if you performed some kind of
DS> chop-search-ish routine to identify the indexes associated with the
DS> start and end for @select_nums, then just slice them out of
DS> @ten_thousand_nums, but like I said - I'll leave that for the computer
DS> scientists... the time to sort the big array would also have to be
DS> taken into account then since grep could operate on unsorted data.

>> Perhaps the answer involves storing the data in a different structure
>> (I can picture this working very quickly if the values I'm interested
>> in were keys in a hash, but I still don't know how to do it).


DS> I don't see how this would be any faster in a hash, but I'm all ears...

a hash would have the same problem as an array, you can't do ordered
sequential access from any given value. the proper way to do this is
with a B tree or similar structure. then you have fairly quick access
O(log N) to any element and very quick access O(1) to the next elements
in sorted order. this is also famously known as ISAM (indexed sequential
access method) and was a core feature in cobol and pl/1.

i wouldn't be surprised if there was some btree stuff on cpan. on disk
you can get it from most any decent db as it is a common type of index.

uri

--
Uri Guttman ------ http://www.velocityreviews.com/forums/(E-Mail Removed) -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
 
Reply With Quote
 
Alf McLaughlin
Guest
Posts: n/a
 
      03-17-2006
>How do you know the DBMS isn't just looping through the whole table behind
>the scenes? Or is that the point, that you want it to happen behind
>the scenes?


I think maybe the DBMS is using a BTREE once it is properly indexed. I
think this is going on because if I first create the index:

CREATE INDEX position ON my_table(position);

Then I can see the number of rows needed for the search go down
considerably:

EXPLAIN SELECT position FROM my_table WHERE position BETWEEN $value1
and $value2;

So, yes, an unindexed database search isn't any faster than looping
through all the values.

The following module seems to work:

http://search.cpan.org/~rrwo/Tie-Ran...e/RangeHash.pm

I'm sure there are BTREE modules; that is a great idea.

 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      03-17-2006
"Alf McLaughlin" <(E-Mail Removed)> wrote:
> >How do you know the DBMS isn't just looping through the whole table
> >behind the scenes? Or is that the point, that you want it to happen
> >behind the scenes?

>
> I think maybe the DBMS is using a BTREE once it is properly indexed. I
> think this is going on because if I first create the index:

....
>
> So, yes, an unindexed database search isn't any faster than looping
> through all the values.


So from the above, I gather it is not the neat syntax of SQL but rather
the performance offered by the index you are interested in?

Just using DBI and SQL might be a good way to go.

>
> The following module seems to work:
>
> http://search.cpan.org/~rrwo/Tie-Ran...e/RangeHash.pm


I can't seem to make that module work for what you want it to do. Or at
least not correctly.

use strict;
use warnings;
use Tie::RangeHash;

my $t=new Tie::RangeHash(Type => Tie::RangeHash::TYPE_NUMBER);
my @array;

for (1..10_000) {
my $x = int rand(2e7);
$t->add("$x,$x",$x);
push @array,$x;
};

print join "\t", "fetch", (sort {$a<=>$b}
$t->fetch_overlap("50000,60000"))[1..10];
print join "\t", "array", sort {$a<=>$b}
grep $_>=50_000 && $_<=60_000, @array;

__END__

fetch 3202 5936 10416 16533 17858 25605 26118 29266 31885 35022
array 50701 50948 51314 51644 56512 56999 58544


> I'm sure there are BTREE modules; that is a great idea.


I've played around with Tree::BPTree. You can get cursors into the tree,
but the cursors always start at the beginning or end. As far as I can
tell, you can't ask for a cursor which starts at any given point, which
seems to stultify the whole point of using BTrees. I started trying to add
this feature, but I decided the whole thing was slow enough that I might
as well just use hashes and/or arrays, or a real db with DBI (or a binning
method with DBM:eep for one recent job.)

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
Alf McLaughlin
Guest
Posts: n/a
 
      03-23-2006
>I can't seem to make that module work for what you want it to do. Or at
>least not correctly.


I think I got Tie::RangeHash to do what I want. It seems to work; let
me know if you notice anything wrong with this. I haven't clocked it
yet, so this may still be inefficient.

#!/usr/bin/perl

BEGIN: {
use Tie::RangeHash;
}

MAIN: {
my $val1 = 1;
my $val2 = 100;
my $val3 = 1000;
my $val4 = 10000;
my $val5 = 40000;
my $val6 = 50000;
my $range1 = "$val1,$val2";
my $range2 = "$val3,$val4";
my $range3 = "$val5,$val6";
tie %hash, 'Tie::RangeHash', {Type => Tie::RangeHash::TYPE_NUMBER};
%hash = ($range1 => 'Element1',
$range2 => 'Element2',
$range3 => 'Element3');
my $result = $hash{$ARGV[0]};
unless($result) { $result = 'Sorry, No Results';}
print "$result\n";
}

 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      03-23-2006
"Alf McLaughlin" <(E-Mail Removed)> wrote:
> >I can't seem to make that module work for what you want it to do. Or at
> >least not correctly.

>
> I think I got Tie::RangeHash to do what I want. It seems to work; let
> me know if you notice anything wrong with this. I haven't clocked it
> yet, so this may still be inefficient.


There are only two concerns I have about it (other than efficiency).
One is that it doesn't do what you originally said you wanted. It takes
a single-point query, and returns a single "label" for the range that that
query falls in. Your original specification was that it take a range query
(a two-point query) and return a list of all the points in that range. So
originally you wanted a database of points probed with a range query, but
this is a database of ranges probed with a point query. These are quite
different things. So I don't know if your original description wasn't
accurate and you new realize that this is what you really wanted to do
(hey, it happens), or if it was originally accurate but you lost sight of
that goal (that happens, too).

My other concern is that, when I'm using a module which seems to give
obviously wrong answers under some use-cases, I am always a bit worried
about whether that same underlying problem could also cause less obvious
wrong answers in other use-cases. So if I were going to use this module
for any real work, I'd either trace done the cause of the fetch_overlap bug
to make sure it doesn't also effect the FETCH method, or I'd do extensive
tests of the FETCH method by generating large amounts of random data to
query, and verify that it gives the right answer.

Cheers,

Xho


>
> #!/usr/bin/perl
>
> BEGIN: {
> use Tie::RangeHash;
> }
>
> MAIN: {
> my $val1 = 1;
> my $val2 = 100;
> my $val3 = 1000;
> my $val4 = 10000;
> my $val5 = 40000;
> my $val6 = 50000;
> my $range1 = "$val1,$val2";
> my $range2 = "$val3,$val4";
> my $range3 = "$val5,$val6";
> tie %hash, 'Tie::RangeHash', {Type => Tie::RangeHash::TYPE_NUMBER};
> %hash = ($range1 => 'Element1',
> $range2 => 'Element2',
> $range3 => 'Element3');
> my $result = $hash{$ARGV[0]};
> unless($result) { $result = 'Sorry, No Results';}
> print "$result\n";
> }


--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
Alf McLaughlin
Guest
Posts: n/a
 
      03-23-2006
>There are only two concerns I have about it (other than efficiency).
>One is that it doesn't do what you originally said you wanted.


Hi Xho-
The label can also be a data structure that holds the values of the
low and high ranges. The code below demonstrates this.

>My other concern is that, when I'm using a module which seems to give
>obviously wrong answers under some use-cases, I am always a bit worried


Me too! In addition to evaluating speed, I will compare it to my
subroutine that doesn't use the module to make sure both versions of
the program are giving me the same results. Best, ALF


The revised code:

#!/usr/bin/perl

BEGIN: {
use Tie::RangeHash;
}

MAIN: {
my ($val1, $val2, $val3, $val4, $val5, $val6) = (100, 1000, 10000,
40000, 50000, 100000);
my $range1 = "$val1,$val2";
my $range2 = "$val3,$val4";
my $range3 = "$val5,$val6";
tie %hash, 'Tie::RangeHash', {Type => Tie::RangeHash::TYPE_NUMBER};
%hash = ($range1 => [($val1, $val2)],
$range2 => [($val3, $val4)],
$range3 => [($val5, $val6)]);
my ($low_range, $high_range) = ($hash{$ARGV[0]}->[0],
$hash{$ARGV[0]}->[1]);
my $result;
unless($low_range && $high_range) {
$result = 'Sorry, No Results';
} else {
$result = "low: $low_range, high: $high_range";
}
print "$result\n";
}

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What is the fastest way to pull data from an Oracle table? Gummy ASP .Net 2 02-08-2007 01:21 PM
range() is not the best way to check range? Summercoolness@gmail.com Python 46 07-25-2006 08:10 PM
weak pull up and pull down krithiga81@yahoo.com VHDL 2 06-28-2006 02:18 PM
About to pull my hair out!! jpwilliamson@gmail.com Wireless Networking 1 09-03-2005 12:09 PM
Need to pull out referance details =?Utf-8?B?U2FjaGk=?= ASP .Net 1 09-02-2005 02:47 AM



Advertisments