Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > search interval

Reply
Thread Tools

search interval

 
 
Ying Hu
Guest
Posts: n/a
 
      12-23-2003
Hi,
There are two data sets,
data set 1:
A1 123 125
A2 129 200
A3 400 420
....
....
data set 2:
B1 126
B2 130
B3 202
....
....

My question is how to get B2 and A2 { 130 (B2) is in [from]129 [to]200
(A2)} .

thanks
Ying

 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      12-23-2003
Ying Hu wrote:
> There are two data sets,
> data set 1:
> A1 123 125
> A2 129 200
> A3 400 420
> ...
> ...
> data set 2:
> B1 126
> B2 130
> B3 202
> ...
> ...
>
> My question is how to get B2 and A2 { 130 (B2) is in [from]129
> [to]200 (A2)} .


I would suggest that you write a script that does what you want. Perl
is probably a suitable programming language, btw.

If you would encounter difficulties, that you after having made
serious attempts to resolve them yourself can't find the solution to,
please post your program here. If you do so, don't forget to comply
with the posting guidelines:
http://mail.augustmail.com/~tadmc/cl...uidelines.html

Good luck!

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
 
 
 
Ragnar Hafstađ
Guest
Posts: n/a
 
      12-23-2003
"Ying Hu" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi,
> There are two data sets,
> data set 1:
> A1 123 125
> A2 129 200
> A3 400 420
> ...
> ...
> data set 2:
> B1 126
> B2 130
> B3 202
> ...
> ...
>
> My question is how to get B2 and A2 { 130 (B2) is in [from]129 [to]200
> (A2)} .


SELECT b.key,a.key from tableb,tablea where b.val between a.minval and
a.maxval

seriously, we need a bit of context here to fathom what your real question
is.

if your data sets are small, simple iteration might be called for
otherwise, more details could help.
for instance what ranges for A and B values are we talking about here?
what are the sizes of the datasets?
are the A intervals non-overlapping? are the B values unique?

what have you tried so far, and how did that fail?

gnari



 
Reply With Quote
 
Matt Garrish
Guest
Posts: n/a
 
      12-24-2003

"Ying Hu" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi,
> There are two data sets,
> data set 1:
> A1 123 125
> A2 129 200
> A3 400 420
> ...
> ...
> data set 2:
> B1 126
> B2 130
> B3 202
> ...
> ...
>
> My question is how to get B2 and A2 { 130 (B2) is in [from]129 [to]200
> (A2)} .
>


Feels like homework, you don't even seem to have tried, AND you didn't even
make it very clear what you're trying to do (are the identifier values of
any use whatsoever???), but since it's the holidays here's something to get
you started: (and please don't post back asking for modifications to the
code if it doesn't do what you want!)

use strict;
use warnings;

my %test;
my %range;

while (my $line = <DATA>) {

next if $line =~ /^\s*$/;

my @vals = split(/\s+/, $line);

if (scalar(@vals) == 2) {

$test{$vals[0]} = [@vals];

}

elsif (scalar(@vals) == 3) {

$range{$vals[0]} = [@vals];

}

else {

die "Unknown pattern found in data line $.\n";

}

}

# because life is all about pleasing Uri...

my @print;

foreach my $key (sort keys %test) {

my $chk = isbetween($test{$key}[1], \%range);

push @print, $chk if $chk;

}

print @print;



sub isbetween {

my ($v, $href) = @_;

foreach my $r (sort keys %$href) {

# just for clarity I've assigned more comprehensible variable names

my $min = $$href{$r}[1];
my $max = $$href{$r}[2];

if (($v > $min) && ($v < $max)) {
return "The value $v is between $min and $max\n";
}

}

}


__DATA__
A1 123 125
A2 129 200
A3 400 420

B1 126
B2 130
B3 202



 
Reply With Quote
 
Sara
Guest
Posts: n/a
 
      12-24-2003
Ying Hu <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> Hi,
> There are two data sets,
> data set 1:
> A1 123 125
> A2 129 200
> A3 400 420
> ...
> ...
> data set 2:
> B1 126
> B2 130
> B3 202
> ...
> ...
>
> My question is how to get B2 and A2 { 130 (B2) is in [from]129 [to]200
> (A2)} .
>
> thanks
> Ying


TRY:

http://groups.google.com/groups?hl=e...=comp.lang.apl

Its better suited to this kind of application.

Cheers!
 
Reply With Quote
 
Ying Hu
Guest
Posts: n/a
 
      12-24-2003
My perl script is as the following, but I do know that is not good one.
The two data sets are in text files and very huge.

$infile = "data_sets1.txt";
open IN, $infile or die "$infile $!\n";
while (<IN>){
chomp;
@F = split;
$A_from{$F[0]} = $F[1];
$A_to{$F[0] = $F[2];
}
$infile = "data_sets2.txt";
open IN, $infile or die "$infile $!\n";
while (<IN>){
chomp;
@F = split;
$B_point{$F[0]} = $F[1];
}

foreach $b (sort {$B_point{a}<=>$B_point{$b} keys %B_point){
foreach $a (sort {$A_from{$a} <=> $A_from{$b} keys %A_from){
if ($B_point{$b} >= $A_from{$a} and $B_point{$b} <= $A_to){
print "A:$a B:$b\n";
}
}
}


Ying Hu wrote:

> Hi,
> There are two data sets,
> data set 1:
> A1 123 125
> A2 129 200
> A3 400 420
> ...
> ...
> data set 2:
> B1 126
> B2 130
> B3 202
> ...
> ...
>
> My question is how to get B2 and A2 { 130 (B2) is in [from]129 [to]200
> (A2)} .
>
> thanks
> Ying


 
Reply With Quote
 
Ragnar Hafstađ
Guest
Posts: n/a
 
      12-25-2003
"Ying Hu" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> My perl script is as the following, but I do know that is not good one.
> The two data sets are in text files and very huge.
>

a bit more info might still help to optimize
how big datasets?
are the A-intervals overlapping?
what are the value ranges? (for example integers 0-1000)

>
> foreach $b (sort {$B_point{a}<=>$B_point{$b} keys %B_point){

you are sorting the dataset B , but nothing in the following requires it
unless the output needs to be sorted. in that case it might be more
efficient to sort
the output if it is a much smaller set than B

> foreach $a (sort {$A_from{$a} <=> $A_from{$b} keys %A_from){

here you sort unnecessarily A for EACH element of B (bad if B is large)
better to sort it before (if you want to sort it at all)
> if ($B_point{$b} >= $A_from{$a} and $B_point{$b} <= $A_to){
> print "A:$a B:$b\n";
> }
> }
> }


try to move the sort to before the loops and see if you have improvements.

@Aset=sort {$A_from{$a} <=> $A_from{$b} keys %A_from;
@Bset=sort {$B_point{$a}<=>$B_point{$b} keys %B_point;
foreach $b (@Bset) {
foreach $a (@Aset) {
...

if this improvement is not enough, you might make the innerloop slightly
more complex and make use of the fact that (if A and B are sorted)
for any given $Bset[x] >=$Aset[y], you do not need to check $Aset[0..y-1]
when processing $Bset[x+1]. in other words, for each element of Bset,
you can start the inner loop at the first @Aset element that passed the
$B_point{$b} >= $A_from{$a} test in the previous B

also if A is sorted, you can exit from the inner loop with last as soon as
$A_from{$a}>$B_point{$b},
as all following intervals will just be more distant from the B point

another problem is memory usage. if set B is huge and much larger than set A
you might want/need to only read A into memory, and process B line by line.
if both are too huge to read into memory, you might need to use another
method altogether

good luck
gnari





 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      12-25-2003
Ying Hu wrote:
> My perl script is as the following, but I do know that is not good
> one.


I agree. It's not good at all.

1) It does not compile.

2) You are not using strictures (the script you posted contains errors
which are easily debugged by using strict).

3) It's not a good idea to use the special variables $a and $b outside
the sort() function, especially in a program that uses sort().

It's good that you posted code this time, but you have obviously not
studied the posting guidelines as suggested in my previous post in
this thread (or you decided to ignore them). Please give it a new try,
complying with the guidelines. It's inconsiderate to expect the people
who you ask for help to start possible efforts to assist you with
correcting typos.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      12-29-2003
Ragnar Hafstađ <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> "Ying Hu" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
> > My perl script is as the following, but I do know that is not good one.
> > The two data sets are in text files and very huge.
> >

> a bit more info might still help to optimize
> how big datasets?
> are the A-intervals overlapping?
> what are the value ranges? (for example integers 0-1000)


If we assume the ranges are non-overlapping and delimited by smallish
integers, the ranges can be found in constant time without sorting and
searching. First build an array @rangemap that holds, for index $i,
the name of the range $i belongs to, or undef if there is no interval
for $i. Then each interval can be found with a single array lookup:


my @rangemap;

while ( <DATA> ) {
last if /^$/;
my ( $range, $from, $to) = split;
@rangemap[ $from .. $to] = ( $range) x ( $to - $from + 1);
}

while ( <DATA> ) {
my ( $item, $val) = split;
defined and print "$item and $_\n" for $rangemap[ $val];
}

__DATA__
A1 123 125
A2 129 200
A3 400 420

B1 126
B2 130
B3 202

Anno
 
Reply With Quote
 
Ragnar Hafstađ
Guest
Posts: n/a
 
      12-29-2003
"Anno Siegel" <(E-Mail Removed)-berlin.de> wrote in message
news:bspe60$9gl$(E-Mail Removed)-Berlin.DE...
> Ragnar Hafstađ <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> > "Ying Hu" <(E-Mail Removed)> wrote in message
> > news:(E-Mail Removed)...
> > > My perl script is as the following, but I do know that is not good

one.
> > > The two data sets are in text files and very huge.
> > >

> > a bit more info might still help to optimize
> > how big datasets?
> > are the A-intervals overlapping?
> > what are the value ranges? (for example integers 0-1000)

>
> If we assume the ranges are non-overlapping and delimited by smallish
> integers, the ranges can be found in constant time without sorting and
> searching. First build an array @rangemap that holds, for index $i,
> the name of the range $i belongs to, or undef if there is no interval
> for $i. Then each interval can be found with a single array lookup:
>


this possibility, was one of the things i had in mind when I asked my
questions to the OP.

Usually I wait until the premises are met, before I code complete
optimisations based on them

....snipped nice implmentation...

I hope the OP does not use this on a data set like
A1 0 1000000000
A2 1000000001 2000000000
....


gnari



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Thunderbird's mailnews.mark_message_read.delay.interval setting Larry__Weiss Firefox 1 01-27-2005 10:50 PM
Informix Interval values into a DataGrid Dave Varley via .NET 247 ASP .Net 0 09-01-2004 10:52 AM
search within a search within a search - looking for better way...my script times out Abby Lee ASP General 5 08-02-2004 04:01 PM
IS-IS CSNP multicast interval Igor Mamuzić Cisco 2 07-05-2004 08:30 AM



Advertisments