Velocity Reviews > Perl > smarter zipcode search algorithm

# smarter zipcode search algorithm

premgrps@gmail.com
Guest
Posts: n/a

 07-05-2006
Hi,
I have a database with two tables
a) A table of 2 million records with city, zip and associated
information (say XYZ) and
b) zipcode latitude, longitude table having >40,000 records/zip codes

PROBLEM:
I need to find the the XYZs within the the range of a certain zipcode.
This zipcode and radial range in miles is entered by the user (web
interface).

The brute force way is to calculate the distance between the user
zipcode and all the zipcodes in the database. Once the zipcode_range
subroutine gives back the zipcodes within a certain radius, I need to
find all the XYZs from the table #1.

Another approach is to find the zipcodes with a square region (min/max
of the user zipcode latitude/longitude position).

Both the approaches are consuming too much time. Especially if the

My questions:
1. Is there any other smart way to do the above task.
2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
increase the performance. Right now each select command to the
2Million record table takes about a minute.

Thanks.

Dr.Ruud
Guest
Posts: n/a

 07-05-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) schreef:

> I have a database with two tables
> a) A table of 2 million records with city, zip and associated
> information (say XYZ) and
> b) zipcode latitude, longitude table having >40,000 records/zip codes
>
> PROBLEM:
> I need to find the the XYZs within the the range of a certain zipcode.
> This zipcode and radial range in miles is entered by the user (web
> interface).
>
> The brute force way is to calculate the distance between the user
> zipcode and all the zipcodes in the database. Once the zipcode_range
> subroutine gives back the zipcodes within a certain radius, I need to
> find all the XYZs from the table #1.
>
> Another approach is to find the zipcodes with a square region (min/max
> of the user zipcode latitude/longitude position).
>
> Both the approaches are consuming too much time. Especially if the
>
> My questions:
> 1. Is there any other smart way to do the above task.
> 2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
> increase the performance. Right now each select command to the
> 2Million record table takes about a minute.

This is not a Perl question.

Which RDBMS?

1. Make sure there are indexes on the lattitude and longitude columns of
the zipcode table. I assume that zipcode is the PK.

2. Pre-calculate the (min_lattitude, max_lattitude, min_longitude,
max_longitude) from the user input.

3. Fill a temporary zipcode table (again with the same indexes) with a
selection from the full zipcode table, with only the candidate zipcodes.
Be sure to have the zipcode column left from the comparison:
WHERE (zipcode.lattitude >= min_lattitude AND zipcode.lattitude <=
max_lattitude AND zipcode.longtitude >= min_longtitude AND
zipcode.longtitude <= max_longtitude).

4. If necessary, refine that result (the temporary zipcode table) by
deleting the records that aren't in the radial distance.

That temporary zipcode table can also be a query, on which you run the

For the city table, have an index on the zipcode, and join that with the
(refined) result.

--
Affijn, Ruud

"Gewoon is een tijger."

Guest
Posts: n/a

 07-06-2006
<(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Hi,
> I have a database with two tables
> a) A table of 2 million records with city, zip and associated
> information (say XYZ) and
> b) zipcode latitude, longitude table having >40,000 records/zip codes
>
> PROBLEM:
> I need to find the the XYZs within the the range of a certain zipcode.
> This zipcode and radial range in miles is entered by the user (web
> interface).

[...]

Your question has nothing to do with Perl (nor with zip codes, btw.).
It is a problem in computational geometry, and that is where I would
start looking for algorithms.

Anno

gmillerd@gmail.com
Guest
Posts: n/a

 07-06-2006
i wrote this many years ago for a singles website i made ... you may
find that just making a sql table with zipcode_to,zipcode_from,distance
is where you want to be. making a table for three digits and for five
digit zipcodes is a plus as well.

also you can do the lookups now with google map api i just noticed, way
to get banned with your 29000 zipcode entries

good luck

use strict;
use Math::Trig;

##
## racine wisconsin
##
my \$a_lat = 42.13404;
my \$a_long = -87.934097;

##
## wheeling illinois
##
my \$b_lat = 42.716112;
my \$b_long = -87.823329;

my \$b_lat = 41.37372;
my \$b_long = -93.376719;

################################################## ####################

print
gc(
\$a_lat,
\$a_long,
\$b_lat,
\$b_long),
"\n";

################################################## ####################

sub gc
{
my (\$a_lat, \$a_long, \$b_lat, \$b_long) = @_;

##
##

##
## meters in a mile
##
my \$meterspermile = 1609.344;

##
##
my \$earth_radius = 6378.14 * 1000;

my \$d = acos(sin(\$a_lat) * sin(\$b_lat) + cos(\$a_lat) * cos(\$b_lat)
* cos(\$a_long - \$b_long));

\$d = (\$earth_radius * \$d) / \$meterspermile;

return(\$d);
}

http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de wrote:
> <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> > Hi,
> > I have a database with two tables
> > a) A table of 2 million records with city, zip and associated
> > information (say XYZ) and
> > b) zipcode latitude, longitude table having >40,000 records/zip codes
> >
> > PROBLEM:
> > I need to find the the XYZs within the the range of a certain zipcode.
> > This zipcode and radial range in miles is entered by the user (web
> > interface).

>
> [...]
>
> Your question has nothing to do with Perl (nor with zip codes, btw.).
> It is a problem in computational geometry, and that is where I would
> start looking for algorithms.
>
> Anno

gmillerd@gmail.com
Guest
Posts: n/a

 07-06-2006
i wrote this many years ago for a singles website i made ... you may
find that just making a sql table with zipcode_to,zipcode_from,distance
is where you want to be. making a table for three digits and for five
digit zipcodes is a plus as well.

also you can do the lookups now with google map api i just noticed, way
to get banned with your 29000 zipcode entries

good luck

use strict;
use Math::Trig;

##
## racine wisconsin
##
my \$a_lat = 42.13404;
my \$a_long = -87.934097;

##
## wheeling illinois
##
my \$b_lat = 42.716112;
my \$b_long = -87.823329;

my \$b_lat = 41.37372;
my \$b_long = -93.376719;

################################################## ####################

print
gc(
\$a_lat,
\$a_long,
\$b_lat,
\$b_long),
"\n";

################################################## ####################

sub gc
{
my (\$a_lat, \$a_long, \$b_lat, \$b_long) = @_;

##
##

##
## meters in a mile
##
my \$meterspermile = 1609.344;

##
##
my \$earth_radius = 6378.14 * 1000;

my \$d = acos(sin(\$a_lat) * sin(\$b_lat) + cos(\$a_lat) * cos(\$b_lat)
* cos(\$a_long - \$b_long));

\$d = (\$earth_radius * \$d) / \$meterspermile;

return(\$d);
}

(E-Mail Removed)-berlin.de wrote:
> <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> > Hi,
> > I have a database with two tables
> > a) A table of 2 million records with city, zip and associated
> > information (say XYZ) and
> > b) zipcode latitude, longitude table having >40,000 records/zip codes
> >
> > PROBLEM:
> > I need to find the the XYZs within the the range of a certain zipcode.
> > This zipcode and radial range in miles is entered by the user (web
> > interface).

>
> [...]
>
> Your question has nothing to do with Perl (nor with zip codes, btw.).
> It is a problem in computational geometry, and that is where I would
> start looking for algorithms.
>
> Anno

xhoster@gmail.com
Guest
Posts: n/a

 07-06-2006
(E-Mail Removed) wrote:
> Hi,
> I have a database with two tables
> a) A table of 2 million records with city, zip and associated
> information (say XYZ) and
> b) zipcode latitude, longitude table having >40,000 records/zip codes

40,001 and 10**89 are both greater than 40,000. I assuming you mean ~40,
000 rather than >40,000, otherwise it is meaningless.

Is that ~40,000 records where each record is a zipcode?
Or is it ~40,000 records per zip code?

> PROBLEM:
> I need to find the the XYZs within the the range of a certain zipcode.
> This zipcode and radial range in miles is entered by the user (web
> interface).

Are you assuming zip codes to be points?

> The brute force way is to calculate the distance between the user
> zipcode and all the zipcodes in the database. Once the zipcode_range
> subroutine gives back the zipcodes within a certain radius, I need to
> find all the XYZs from the table #1.

OK. You could probably even build a table giving the distances between
every pair of zip codes, or at least every pair within 300 miles or so
of each other.

> Another approach is to find the zipcodes with a square region (min/max
> of the user zipcode latitude/longitude position).

Yep, that is another approach. You could construct a square around the
circle, use a data structure to get everything in the square, then brute
force just those things in the square to see if they are also within the
circle. (neglecting the curvature of the earth.)

> Both the approaches are consuming too much time. Especially if the

Starts increasing? You make it sound like an organic creature. Isn't it
a simple number provided by the end user?

Anyway, what is taking a long time? Getting a list of zip codes within
X miles of the reference zip code? Turning that list into a list of all
the qualifying XYZ?

>
> My questions:
> 1. Is there any other smart way to do the above task.

We only have the vaguest idea of how you are doing it now. You can't make
tuning decisions based on vague notions.

> 2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
> increase the performance. Right now each select command to the
> 2Million record table takes about a minute.

What database system are you using? What query are you issuing to it?
What indices exist? What does this have to do with Perl?

If you are using Perl as your database system, have you looked at Tree::R?

Xho

--
Usenet Newsgroup Service \$9.95/Month 30GB

PGPS
Guest
Posts: n/a

 07-11-2006
Thanks for all your answers. I finally got the problem figured out
myself with a subquery. I'm new to databases though.

Thanks.

(E-Mail Removed) wrote:
> (E-Mail Removed) wrote:
> > Hi,
> > I have a database with two tables
> > a) A table of 2 million records with city, zip and associated
> > information (say XYZ) and
> > b) zipcode latitude, longitude table having >40,000 records/zip codes

>
> 40,001 and 10**89 are both greater than 40,000. I assuming you mean ~40,
> 000 rather than >40,000, otherwise it is meaningless.
>
> Is that ~40,000 records where each record is a zipcode?
> Or is it ~40,000 records per zip code?
>
>
> > PROBLEM:
> > I need to find the the XYZs within the the range of a certain zipcode.
> > This zipcode and radial range in miles is entered by the user (web
> > interface).

>
> Are you assuming zip codes to be points?
>
>
> > The brute force way is to calculate the distance between the user
> > zipcode and all the zipcodes in the database. Once the zipcode_range
> > subroutine gives back the zipcodes within a certain radius, I need to
> > find all the XYZs from the table #1.

>
> OK. You could probably even build a table giving the distances between
> every pair of zip codes, or at least every pair within 300 miles or so
> of each other.
>
> > Another approach is to find the zipcodes with a square region (min/max
> > of the user zipcode latitude/longitude position).

>
> Yep, that is another approach. You could construct a square around the
> circle, use a data structure to get everything in the square, then brute
> force just those things in the square to see if they are also within the
> circle. (neglecting the curvature of the earth.)
>
> > Both the approaches are consuming too much time. Especially if the
> > radial distance starts increasing.

>
> Starts increasing? You make it sound like an organic creature. Isn't it
> a simple number provided by the end user?
>
> Anyway, what is taking a long time? Getting a list of zip codes within
> X miles of the reference zip code? Turning that list into a list of all
> the qualifying XYZ?
>
> >
> > My questions:
> > 1. Is there any other smart way to do the above task.

>
> We only have the vaguest idea of how you are doing it now. You can't make
> tuning decisions based on vague notions.
>
> > 2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
> > increase the performance. Right now each select command to the
> > 2Million record table takes about a minute.

>
> What database system are you using? What query are you issuing to it?
> What indices exist? What does this have to do with Perl?
>
> If you are using Perl as your database system, have you looked at Tree::R?
>
> Xho
>
> --
> Usenet Newsgroup Service \$9.95/Month 30GB