Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   suitable key for a hash (http://www.velocityreviews.com/forums/t913965-suitable-key-for-a-hash.html)

ccc31807 10-12-2010 04:19 PM

suitable key for a hash
 
I have a data file to process that consists of about 25K rows and
about 30 columns. This file contains no column with unique values,
that is, every column contains duplicate values. I am placing the data
in a hash to process it (so I can access the data values by name
rather than position), and the only 'key' I can come up with is the $.
variable for the input line numbers.

Surely someone must have dealt with this problem before. Is there a
better solution?

The processing requires dumping the data into discrete categories,
e.g., level, state, person's name, status, for the purpose of
generating reports, e.g., by level, by state, by name, by status, and
not having a unique key isn't an issue.

CC.

RedGrittyBrick 10-12-2010 05:03 PM

Re: suitable key for a hash
 
On 12/10/2010 17:19, ccc31807 wrote:
> I have a data file to process that consists of about 25K rows and
> about 30 columns. This file contains no column with unique values,
> that is, every column contains duplicate values. I am placing the data
> in a hash to process it (so I can access the data values by name
> rather than position), and the only 'key' I can come up with is the $.
> variable for the input line numbers.
>
> Surely someone must have dealt with this problem before. Is there a
> better solution?


A better solution than
... $name{$index} ...
must surely be
... $name[$index] ...

I don't see any point using hashes if the key value is an integer in the
range 1..25000 with no gaps.


> The processing requires dumping the data into discrete categories,
> e.g., level, state, person's name, status, for the purpose of
> generating reports, e.g., by level, by state, by name, by status, and
> not having a unique key isn't an issue.


An SSCCE would help.

--
RGB

Jim Gibson 10-12-2010 09:00 PM

Re: suitable key for a hash
 
In article
<9a9c9ce1-b08f-4781-a055-8af8cca793ae@28g2000yqm.googlegroups.com>,
ccc31807 <cartercc@gmail.com> wrote:

> I have a data file to process that consists of about 25K rows and
> about 30 columns. This file contains no column with unique values,
> that is, every column contains duplicate values. I am placing the data
> in a hash to process it (so I can access the data values by name
> rather than position), and the only 'key' I can come up with is the $.
> variable for the input line numbers.
>
> Surely someone must have dealt with this problem before. Is there a
> better solution?


If you have records with duplicate keys and you want to store the data
in a hash for rapid lookup, use array references as hash values
(untested):

while(<>) {
my( $name, @rest ) = split;
push( @{$data{$name}}, \@rest );
}

>
> The processing requires dumping the data into discrete categories,
> e.g., level, state, person's name, status, for the purpose of
> generating reports, e.g., by level, by state, by name, by status, and
> not having a unique key isn't an issue.


Store the data in an array and create indices for key fields (untested);

while(<>) {
my @fields = split;
push( @data, \@fields );
push( @{$field1_index{$field[0]}}, $#data );
push( @{$field2_index{$field[1]}}, $#data );
...
}

--
Jim Gibson

Xho Jingleheimerschmidt 10-13-2010 01:00 AM

Re: suitable key for a hash
 
ccc31807 wrote:
> I have a data file to process that consists of about 25K rows and
> about 30 columns. This file contains no column with unique values,
> that is, every column contains duplicate values.



Jointly, or just severly?


> I am placing the data
> in a hash to process it (so I can access the data values by name
> rather than position),


If you wish to access it by name, then you must know what the name is.

> and the only 'key' I can come up with is the $.
> variable for the input line numbers.


Why not just an array, in that case?

>
> Surely someone must have dealt with this problem before. Is there a
> better solution?
>
> The processing requires dumping the data into discrete categories,
> e.g., level, state, person's name, status, for the purpose of
> generating reports, e.g., by level, by state, by name, by status, and
> not having a unique key isn't an issue.


Ok, so just stick it directly into those structures.

Xho

Justin C 10-13-2010 09:25 AM

Re: suitable key for a hash
 
On 2010-10-12, ccc31807 <cartercc@gmail.com> wrote:
> I have a data file to process that consists of about 25K rows and
> about 30 columns. This file contains no column with unique values,
> that is, every column contains duplicate values. I am placing the data
> in a hash to process it (so I can access the data values by name
> rather than position), and the only 'key' I can come up with is the $.
> variable for the input line numbers.
>
> Surely someone must have dealt with this problem before. Is there a
> better solution?
>
> The processing requires dumping the data into discrete categories,
> e.g., level, state, person's name, status, for the purpose of
> generating reports, e.g., by level, by state, by name, by status, and
> not having a unique key isn't an issue.


Instead of sticking it into a hash so that you can go over all of it
again, why not process (or part process) it into the relevant discrete
categories as part of the import?

Justin.
--
Justin C, by the sea.

ccc31807 10-13-2010 01:37 PM

Re: suitable key for a hash
 
Thanks for your reply, and for all the others.

I decided to continue to use $. as the hash key. As it turns out, the
key isn't relevant to my application, as I'm not using the key to look
up the hash values. I'm just iterating through the hash, collecting
certain values, so the key is totally superfluous -- the only reason I
need a key is because of the nature of the hash.

I don't want to use an array because I'm creating a number of
different reports, and it's simply a lot easier to use values like:

$data{$key}{firstname}, $data{$key}{lastname}

than it is to use values like

$data[13456][2], $data[23543][3]

On Oct 12, 1:03*pm, RedGrittyBrick <RedGrittyBr...@spamweary.invalid>
wrote:

> An SSCCE would help.


I'm sorry, but I don't know this. What is an SSCCE?

CC

Dr.Ruud 10-13-2010 01:51 PM

Re: suitable key for a hash
 
On 2010-10-13 15:37, ccc31807 wrote:

> I decided to continue to use $. as the hash key.


If it smells like an array index ...


> As it turns out, the
> key isn't relevant to my application, as I'm not using the key to look
> up the hash values. I'm just iterating through the hash, collecting
> certain values, so the key is totally superfluous -- the only reason I
> need a key is because of the nature of the hash.
>
> I don't want to use an array because I'm creating a number of
> different reports, and it's simply a lot easier to use values like:
>
> $data{$key}{firstname}, $data{$key}{lastname}
>
> than it is to use values like
>
> $data[13456][2], $data[23543][3]


That is not the proper comparison.

$data[ $row ]{ firstname }

$data[ $row ][ FIRSTNAME ]

(assumes a numeric constant FIRSTNAME)


> What is an SSCCE?


JFGI

--
Ruud

Jürgen Exner 10-13-2010 03:08 PM

Re: suitable key for a hash
 
ccc31807 <cartercc@gmail.com> wrote:
>I don't want to use an array because I'm creating a number of
>different reports, and it's simply a lot easier to use values like:
>
>$data{$key}{firstname}, $data{$key}{lastname}
>
>than it is to use values like
>
>$data[13456][2], $data[23543][3]


And why not use values like

$data[$key]{firstname}, $data[$key]{lastname}

jue

ccc31807 10-13-2010 06:03 PM

Re: suitable key for a hash
 
On Oct 13, 11:08*am, Jürgen Exner <jurge...@hotmail.com> wrote:
> And why not use values like
>
> * * * * $data[$key]{firstname}, $data[$key]{lastname}


Because I wasn't completely truthful about my processing. I have to
break the data apart on various values, some if which are unique keys,
e.g., identification numbers for individual people. The data includes
clients and counselors, and (obviously) clients can have multiple
counselors and counselors can have multiple clients. Other values are
one of a kind, such as a person's address, regardless of the number of
times the particular person appears in the data. I have to cross
reference these values by unique keys, and I use five hashes to sort
out the data.

I see now that I could use an array for the handful of data elements
for each row that are unique.

Thanks, CC.


All times are GMT. The time now is 09:15 AM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57