Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > logic, algorithms, and (Perl) data structures

Reply
Thread Tools

logic, algorithms, and (Perl) data structures

 
 
ccc31807
Guest
Posts: n/a
 
      03-26-2009
I've been given an assignment to revive a project that was apparently
abandoned by the original developers, and I need some help thinking
through the logic.

The DB has Person table with the name, address, etc., of individual
people. This should have been called the Actors table, but it's too
late now. The names are identifiers, and I know names are not unique,
individuals having several names (J. Smith, Jim Smith, and James Smith
are all the same person) and names being shared by individuals (three
people all named Jim Smith) but we are dealing with this as I indicate
below.

The DB also has an Accounts table for discrete events. Should have
been called the Events table, but never mind. This list events by ID,
which is a unique identifier. The idea is that an Even (or Account)
has a primary Actor (Person) with other actors associated with the
event in one way or another. There are often multiple accounts
concerning the same person with different ID numbers. For example,
medical events may record a broken arm, sprained ankle, and cuts and
abrasions for the same person in three different records.

Accounts are pushed in CSV files and inserted into the database in an
automated manner. Each row contains the data defining the account and
the primary actor. These rows are not currently split, and all the
data gets inserted into the Account table. The secondary actors are
inserted into the Person table.

I would like to split the account records, insert the account specific
data into the Account table and the person specific data into the
Person table. This means associating each account with the primary
person, which means checking the import names against the Person
table. (The fact that we can't uniquely identify the people means that
the accounts must be manually worked, and the client has a staff that
already does this. This is a data problem, not a software problem,
that I am not responsible for.)

The Person table isn't large now, but we anticipate that it will grow
to several million records. Each import file, the accounts, has from
20 to 50 records. There can be as many as 50 per day.

Each day, I read the Person table into a hash that has elements that
look like this: %person --> $person{$last}{$first}{$middle}{$suffix}

I split each record into a series of scalars, including a hash that
looks like this: %account_person --> $account_person{$last}{$first}
($middle}($suffix}.

If there isn't a match, the account is read into Account and the
person is read into Person. If there is a match, the record is flagged
for manual handling.

I've been thinking about Schwartzian Transforms and Guttman-Rosler
Transforms, and am wondering of this is the best way to handle this.
I'm not concerned with real time processing since this is handled off-
line, but I'd like to make it as efficient as I can.

Thanks, CC.
 
Reply With Quote
 
 
 
 
smallpond
Guest
Posts: n/a
 
      03-26-2009
On Mar 26, 9:39 am, ccc31807 <carte...@gmail.com> wrote:

>
> I've been thinking about Schwartzian Transforms and Guttman-Rosler
> Transforms, and am wondering of this is the best way to handle this.
> I'm not concerned with real time processing since this is handled off-
> line, but I'd like to make it as efficient as I can.
>
> Thanks, CC.



My advice:
1) define the operations you want to do
2) lay out tables to simplify the operations
3) write the SQL needed to perform the operations
4) write the perl glue
5) test, test, test, test, test
....
97) worry about the efficiency of the perl code
 
Reply With Quote
 
 
 
 
ccc31807
Guest
Posts: n/a
 
      03-26-2009
On Mar 26, 11:00*am, smallpond <smallp...@juno.com> wrote:
> My advice:
> 1) define the operations you want to do
> 2) lay out tables to simplify the operations
> 3) write the SQL needed to perform the operations
> 4) write the perl glue
> 5) test, test, test, test, test
> ...
> 97) worry about the efficiency of the perl code


Absolutely great advice. Two points:

3.5) write the user interface. Users must interact with the system,
and I find that writing the interface before I write the code (Perl
glue) gives me both the input and the output that I need to write the
code for.

More importantly, people can be active in different capacities in
different events, and I'm drowning in complexity. I'd like to be able
to have ONE (1) table with all my people data. The prior developers
have about ten tables representing different capacities, with the
problem that if a person's address and email changes, you've got to
chase that person over ten (worst case) different tables. My way out
is to match the incoming names against the existing names, which won't
tag duplicates but will at least tag uniques. I'll let the client sort
out the duplicates, and he understands that this isn't a software
problem.

Truthfully, this is the first application that I have been involved
with that has to track people in this way, and I'm having a problem
dealing with it. Thanks for your help.

CC

 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      03-26-2009
>>>>> "c" == ccc31807 <> writes:

c> I've been thinking about Schwartzian Transforms and Guttman-Rosler
c> Transforms, and am wondering of this is the best way to handle this.
c> I'm not concerned with real time processing since this is handled off-
c> line, but I'd like to make it as efficient as I can.

as smallpond has said, worry about efficiency after you get it working
correctly. but in any case if you do need easier and faster sorting, use
Sort::Maker which has both. and you can select the sort style you want
and even print out the generated code for ejimication.

uri

--
Uri Guttman ------ -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
smallpond
Guest
Posts: n/a
 
      03-26-2009
On Mar 26, 11:20 am, ccc31807 <carte...@gmail.com> wrote:
> On Mar 26, 11:00 am, smallpond <smallp...@juno.com> wrote:
>
> > My advice:
> > 1) define the operations you want to do
> > 2) lay out tables to simplify the operations
> > 3) write the SQL needed to perform the operations
> > 4) write the perl glue
> > 5) test, test, test, test, test
> > ...
> > 97) worry about the efficiency of the perl code

>
> Absolutely great advice. Two points:
>
> 3.5) write the user interface. Users must interact with the system,
> and I find that writing the interface before I write the code (Perl
> glue) gives me both the input and the output that I need to write the
> code for.
>
> More importantly, people can be active in different capacities in
> different events, and I'm drowning in complexity. I'd like to be able
> to have ONE (1) table with all my people data. The prior developers
> have about ten tables representing different capacities, with the
> problem that if a person's address and email changes, you've got to
> chase that person over ten (worst case) different tables. My way out
> is to match the incoming names against the existing names, which won't
> tag duplicates but will at least tag uniques. I'll let the client sort
> out the duplicates, and he understands that this isn't a software
> problem.
>
> Truthfully, this is the first application that I have been involved
> with that has to track people in this way, and I'm having a problem
> dealing with it. Thanks for your help.
>
> CC


My point is that the database design will determine 99% of
the system stability and efficiency. 10 tables is ok as long
as they are linked properly - i.e. not by selecting on non-unique
strings, but by using unique keys. Solve that problem first.

You are correct that if you have multiple tables indexed by the same
thing (person) then they should be combined. Especially in any
environment where you have to worry about multiple processes
doing updates, like a website.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Spacing and timing for comparing algorithms and data-structures Alec Taylor Python 0 03-02-2012 04:55 AM
Data structures and signals and stuff. Murph VHDL 9 12-07-2006 06:07 PM
Data Structures and Algorithms in Java kiamkhai@oum.edu.my Java 3 01-14-2006 08:28 PM
structures, structures and more structures (questions about nestedstructures) Alfonso Morra C Programming 11 09-24-2005 07:42 PM
Type Casting IPv4 and IPv6 structures to Generic Structures tweak C Programming 14 06-11-2004 02:43 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57