Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Detecting duplicate keys in a hash I am "requiring"

Reply
Thread Tools

Detecting duplicate keys in a hash I am "requiring"

 
 
Gupit
Guest
Posts: n/a
 
      09-10-2003
Hi,
I have a configuration file which is in a perl hash (of hashes and
arrays) format.

$hash = {
key1 => { key2 => val2,
key3 => val3,
},
key4 => [ vala,valb,],
key1 => {key5 => val5,
key6 => val6
),
};

In my script I do
use vars ($hash);
if (eval "require config_file") {
print "Required config_file";
} else {
print "Couldn't require config_file\n";
}

Now if you notice in the config_file, $hash has two duplicate keys
i.e. key1 has been specified twice. When using require, perl simply
keeps the last value it encountered for key1.

I would like to detect these duplicates. Is there a stricter mode in
perl that will print warnings when perl encounters duplicate keys in
hashes. I could turn it on for the eval require bit.

Is there any other way to detect these duplicates? Unfortunately the
config_file is really huge and uses hash of hashes of hashes and
arrays and so on and writing a script to parse the file would be
painful.

Thanks in advance
G
 
Reply With Quote
 
 
 
 
Tassilo v. Parseval
Guest
Posts: n/a
 
      09-10-2003
Also sprach Gupit:

> I have a configuration file which is in a perl hash (of hashes and
> arrays) format.
>
> $hash = {
> key1 => { key2 => val2,
> key3 => val3,
> },
> key4 => [ vala,valb,],
> key1 => {key5 => val5,
> key6 => val6
> ),
> };
>
> In my script I do
> use vars ($hash);
> if (eval "require config_file") {
> print "Required config_file";
> } else {
> print "Couldn't require config_file\n";
> }
>
> Now if you notice in the config_file, $hash has two duplicate keys
> i.e. key1 has been specified twice. When using require, perl simply
> keeps the last value it encountered for key1.
>
> I would like to detect these duplicates. Is there a stricter mode in
> perl that will print warnings when perl encounters duplicate keys in
> hashes. I could turn it on for the eval require bit.


No, there's no such mode I know of. You can implement that however, but
you'd need to change your logic a little. First of all, you need a tied
hash that will do the warning:

package Tie::StrictHash;
require Tie::Hash;
@Tie::StrictHash::@ISA = qw(Tie::StdHash);
use Carp;
sub STORE {
my ($hash, $key, $val) = @_;
carp "Duplicate key '$key' detected" if exists $hash->{ $key };
# store it nonetheless, overwriting the previous key/value pair
$hash->SUPER::STORE($key, $val);
}

package main;
use vars qw(%hash);
tie %hash => 'Tie::StrictHash';
require 'config_file';

For the above, you have to turn the toplevel hash-ref into a real hash
to make it work.

> Is there any other way to detect these duplicates? Unfortunately the
> config_file is really huge and uses hash of hashes of hashes and
> arrays and so on and writing a script to parse the file would be
> painful.


I'd say any attempt to parse it yourself is extremely likely to fail.
The above solution is more realistic, however it's only a partial one.
Tie::StrictHash cannot detect duplicate keys in nested hashes, only in
the top-level one. I tried a recursive tie/STORE for each sub-hash but
then I realized that those attempts are futile since the subhash has
already been processed by perl before STORE gets to see it.

It's an interesting problem. But I don't see how it could be solved.

Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
 
Reply With Quote
 
 
 
 
Anno Siegel
Guest
Posts: n/a
 
      09-10-2003
Gupit <> wrote in comp.lang.perl.misc:
> Hi,
> I have a configuration file which is in a perl hash (of hashes and
> arrays) format.
>
> $hash = {
> key1 => { key2 => val2,
> key3 => val3,
> },
> key4 => [ vala,valb,],
> key1 => {key5 => val5,
> key6 => val6
> ),
> };
>
> In my script I do
> use vars ($hash);
> if (eval "require config_file") {
> print "Required config_file";
> } else {
> print "Couldn't require config_file\n";
> }
>
> Now if you notice in the config_file, $hash has two duplicate keys
> i.e. key1 has been specified twice.


There are no duplicate keys in a hash, hash keys are unique by
construction. Every entry for the same key overwrites the previous
one.

> When using require, perl simply
> keeps the last value it encountered for key1.


Not only with require -- the hash doesn't ever contain the duplicate.

> I would like to detect these duplicates. Is there a stricter mode in
> perl that will print warnings when perl encounters duplicate keys in
> hashes. I could turn it on for the eval require bit.


Then specify you configuration structure as a list(ref), not a hash.
Replace the outermost pair of "{}" with "[]" (and change the name from
$hash to something descriptive). Then read "perldoc -q duplicate"
to see how to detect duplicates in the list you are importing.

> Is there any other way to detect these duplicates? Unfortunately the
> config_file is really huge and uses hash of hashes of hashes and
> arrays and so on and writing a script to parse the file would be
> painful.


Whatever the structure of the hash, the top level structure is one
of key/value pairs. That isn't so hard to parse.

Anno
 
Reply With Quote
 
Gupit
Guest
Posts: n/a
 
      09-10-2003
Unfortunately duplicate keys may have been specified in the lower
level hash structure as well. Replacing only the top level hash
structure wont help there.

e.g. see key7 in the following example

$hash = {
key1 => { key2 => val2,
key3 => val3,
},
key4 => [ vala,valb,],
key1 => {key5 => val5,
key6 => {
key7 => val7,
key8 => val8,
key7 => val9,
}
),
};

(I dont call this structure $hash in real life, I have simplified the
actual data structure here for ease of understanding and to retain the
focus of my question )

regards,
G


(Anno Siegel) wrote in message news:<bjmqhk$2ce$>...
> Gupit <> wrote in comp.lang.perl.misc:
> > Hi,
> > I have a configuration file which is in a perl hash (of hashes and
> > arrays) format.
> >
> > $hash = {
> > key1 => { key2 => val2,
> > key3 => val3,
> > },
> > key4 => [ vala,valb,],
> > key1 => {key5 => val5,
> > key6 => val6
> > ),
> > };
> >
> > In my script I do
> > use vars ($hash);
> > if (eval "require config_file") {
> > print "Required config_file";
> > } else {

> print "Couldn't require config_file\n";
> > }
> >
> > Now if you notice in the config_file, $hash has two duplicate keys
> > i.e. key1 has been specified twice.

>
> There are no duplicate keys in a hash, hash keys are unique by
> construction. Every entry for the same key overwrites the previous
> one.
>
> > When using require, perl simply
> > keeps the last value it encountered for key1.

>
> Not only with require -- the hash doesn't ever contain the duplicate.
>
> > I would like to detect these duplicates. Is there a stricter mode in
> > perl that will print warnings when perl encounters duplicate keys in
> > hashes. I could turn it on for the eval require bit.

>
> Then specify you configuration structure as a list(ref), not a hash.
> Replace the outermost pair of "{}" with "[]" (and change the name from
> $hash to something descriptive). Then read "perldoc -q duplicate"
> to see how to detect duplicates in the list you are importing.
>
> > Is there any other way to detect these duplicates? Unfortunately the
> > config_file is really huge and uses hash of hashes of hashes and
> > arrays and so on and writing a script to parse the file would be
> > painful.

>
> Whatever the structure of the hash, the top level structure is one
> of key/value pairs. That isn't so hard to parse.
>
> Anno

 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      09-10-2003
Gupit <> wrote in comp.lang.perl.misc:

Please don't top-post. To restore context, I have to recapitulate.

My suggestion, WRT the data structure below, was to use a list (of pairs)
instead of the top-level hash to allow duplicate keys. The list can
trivially be converted to the hash, but can also be checked for duplicates.

> Unfortunately duplicate keys may have been specified in the lower
> level hash structure as well. Replacing only the top level hash
> structure wont help there.
>
> e.g. see key7 in the following example
>
> $hash = {
> key1 => { key2 => val2,
> key3 => val3,
> },
> key4 => [ vala,valb,],
> key1 => {key5 => val5,
> key6 => {
> key7 => val7,
> key8 => val8,
> key7 => val9,
> }
> ),
> };
>
> (I dont call this structure $hash in real life, I have simplified the
> actual data structure here for ease of understanding and to retain the
> focus of my question )


Well, if the problem re-appears at a lower level, re-apply the solution.
In other words, re-design the data structure so that every hash that may
have a key specified more than once is replaced with a list of pairs.

Mind you, I don't know if this is a good solution in your case -- I know
too little about what use you will make of the structure. It's just
the first thing that came to mind.

[snip TOFU]

Anno
 
Reply With Quote
 
Tassilo v. Parseval
Guest
Posts: n/a
 
      09-11-2003
Also sprach Anno Siegel:

> Gupit <> wrote in comp.lang.perl.misc:


>> Unfortunately duplicate keys may have been specified in the lower
>> level hash structure as well. Replacing only the top level hash
>> structure wont help there.
>>
>> e.g. see key7 in the following example
>>
>> $hash = {
>> key1 => { key2 => val2,
>> key3 => val3,
>> },
>> key4 => [ vala,valb,],
>> key1 => {key5 => val5,
>> key6 => {
>> key7 => val7,
>> key8 => val8,
>> key7 => val9,
>> }
>> ),
>> };
>>
>> (I dont call this structure $hash in real life, I have simplified the
>> actual data structure here for ease of understanding and to retain the
>> focus of my question )

>
> Well, if the problem re-appears at a lower level, re-apply the solution.
> In other words, re-design the data structure so that every hash that may
> have a key specified more than once is replaced with a list of pairs.


If we look at $hash->{key4}, regarding everything as a list likely has
its problems, too. The OP might be able to solve this with a
dispatch-table for particular entries. If he knows that there is always
a list associated with 'key4', then this is feasible. Here is a
spontaneous solution (might not be complete since it doesn't handle
structures within array-refs, only within hash-refs). And no, I do not
intend to win a beauty-award for it:

use Data:umper;
use strict;
use vars qw($list);
eval "require 'config_file'";

my %dispatch;
# these hash-keys all have another hash-ref as value
@dispatch{ qw/key1 key2 key3 key5 key6 key6/ } =
(sub {
my ($key, $val, $hash) = @_;
die "Duplicate key '$key'\n" if exists $hash->{ $key };
if (@$val > 1) {
for (my $i = 0; $i <= $#$val; $i += 2) {
die "Duplicate key '$val->[$i]'\n"
if exists $hash->{ $key }->{ $val->[$i] };
if (! ref $val->[$i+1]) {
$hash->{ $key }->{ $val->[$i] } = $val->[$i+1];
} else {
$dispatch{$val->[$i]}->($val->[$i], $val->[$i+1], $hash);
}
}
} else {
$hash->{ $key } = $val;
}
}) x 6;
# takes an array-ref as value
$dispatch{ key4 } =
sub {
my ($key, $val, $hash) = @_;
if (exists $hash->{ $key }) {
die "Duplicate key '$key'\n";
}
$hash->{ $key } = $val;
};

my %hash;
for (my $i = 0; $i <= $#$list; $i += 2) {
$dispatch{$list->[$i]}->($list->[$i], $list->[$i+1], \%hash);
}
print Dumper \%hash;

For a config-file without duplicate values like this:

$list = [
key1 => [ key2 => 'val2',
key3 => 'val3',
key4 => 'val1',
],
key4 => [ qw/vala valb/ ],
key5 => [ key5 => 'val5',
key6 => 'val6'
],
];

it will correctly produce:

$VAR1 = {
'key5' => {
'key5' => 'val5',
'key6' => 'val6'
},
'key4' => [
'vala',
'valb'
],
'key1' => {
'key2' => 'val2',
'key4' => 'val1',
'key3' => 'val3'
}
};

and die when duplicate keys within a sub-structure are detected.

Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      09-11-2003
Tassilo v. Parseval <> wrote in comp.lang.perl.misc:
> Also sprach Anno Siegel:
>
> > Gupit <> wrote in comp.lang.perl.misc:

>
> >> Unfortunately duplicate keys may have been specified in the lower
> >> level hash structure as well. Replacing only the top level hash
> >> structure wont help there.
> >>
> >> e.g. see key7 in the following example
> >>
> >> $hash = {
> >> key1 => { key2 => val2,
> >> key3 => val3,
> >> },
> >> key4 => [ vala,valb,],
> >> key1 => {key5 => val5,
> >> key6 => {
> >> key7 => val7,
> >> key8 => val8,
> >> key7 => val9,
> >> }
> >> ),
> >> };
> >>
> >> (I dont call this structure $hash in real life, I have simplified the
> >> actual data structure here for ease of understanding and to retain the
> >> focus of my question )

> >
> > Well, if the problem re-appears at a lower level, re-apply the solution.
> > In other words, re-design the data structure so that every hash that may
> > have a key specified more than once is replaced with a list of pairs.

>
> If we look at $hash->{key4}, regarding everything as a list likely has
> its problems, too.


Yes, the "real list" among the hashes had me worried to, but I
decided to gloss over it.

> The OP might be able to solve this with a
> dispatch-table for particular entries. If he knows that there is always
> a list associated with 'key4', then this is feasible.


We need a way to tell an ordinary list structure from one that is meant
to represent a hash with multiple keys. A dispatch table, like you
suggest, can do that. Another way would be to point to the "real"
listrefs though another reference. When traversing the structure,
whenever we meet a scalar ref, we know it contains a "real" listref.
A plain listref represents a multi-hash. (Or the other way around.)

Similarly, one could bless one kind of listrefs into a class to make
them distinguishable.

> Here is a
> spontaneous solution (might not be complete since it doesn't handle
> structures within array-refs, only within hash-refs). And no, I do not
> intend to win a beauty-award for it:


[code and results snipped]

> ...and die when duplicate keys within a sub-structure are detected.


I am beginning to have doubts about that. I get the impression the
OP *wants* the duplicate keys.

Anno
 
Reply With Quote
 
Tassilo v. Parseval
Guest
Posts: n/a
 
      09-11-2003
Also sprach Anno Siegel:

> Tassilo v. Parseval <> wrote in comp.lang.perl.misc:
>> Also sprach Anno Siegel:


>> > Well, if the problem re-appears at a lower level, re-apply the solution.
>> > In other words, re-design the data structure so that every hash that may
>> > have a key specified more than once is replaced with a list of pairs.

>>
>> If we look at $hash->{key4}, regarding everything as a list likely has
>> its problems, too.

>
> Yes, the "real list" among the hashes had me worried to, but I
> decided to gloss over it.
>
>> The OP might be able to solve this with a
>> dispatch-table for particular entries. If he knows that there is always
>> a list associated with 'key4', then this is feasible.

>
> We need a way to tell an ordinary list structure from one that is meant
> to represent a hash with multiple keys. A dispatch table, like you
> suggest, can do that. Another way would be to point to the "real"
> listrefs though another reference. When traversing the structure,
> whenever we meet a scalar ref, we know it contains a "real" listref.
> A plain listref represents a multi-hash. (Or the other way around.)
>
> Similarly, one could bless one kind of listrefs into a class to make
> them distinguishable.


Gee. We are beginning to make up our own little meta-language now.
I think when things start to get that sophisticated, a complete new
approach is worth a look. Maybe the OP should use a real config-file
format. Either one already supported in the CPAN or his own one that is
easy to parse.

But still, the problem presented here again makes me want a tie_deeply()
mechanism for perl. There are situations (like this one), where tying
the nested structures is impossible because it happens too late.

>> ...and die when duplicate keys within a sub-structure are detected.

>
> I am beginning to have doubts about that. I get the impression the
> OP *wants* the duplicate keys.


Well, first step would be s/die/warn/g. I'm too lazy now to modify
my routines accordingly, but it shouldn't be too hard to turn the value
of an already existing key/value pair into an array-ref and push the
second value onto it.

Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      09-11-2003
Tassilo v. Parseval <> wrote in comp.lang.perl.misc:
> Also sprach Anno Siegel:
>
> > Tassilo v. Parseval <> wrote in

> comp.lang.perl.misc:
> >> Also sprach Anno Siegel:

>
> >> > Well, if the problem re-appears at a lower level, re-apply the solution.
> >> > In other words, re-design the data structure so that every hash that may
> >> > have a key specified more than once is replaced with a list of pairs.
> >>
> >> If we look at $hash->{key4}, regarding everything as a list likely has
> >> its problems, too.

> >
> > Yes, the "real list" among the hashes had me worried to, but I
> > decided to gloss over it.
> >
> >> The OP might be able to solve this with a
> >> dispatch-table for particular entries. If he knows that there is always
> >> a list associated with 'key4', then this is feasible.

> >
> > We need a way to tell an ordinary list structure from one that is meant
> > to represent a hash with multiple keys. A dispatch table, like you
> > suggest, can do that. Another way would be to point to the "real"
> > listrefs though another reference. When traversing the structure,
> > whenever we meet a scalar ref, we know it contains a "real" listref.
> > A plain listref represents a multi-hash. (Or the other way around.)
> >
> > Similarly, one could bless one kind of listrefs into a class to make
> > them distinguishable.

>
> Gee. We are beginning to make up our own little meta-language now.
> I think when things start to get that sophisticated, a complete new
> approach is worth a look. Maybe the OP should use a real config-file
> format. Either one already supported in the CPAN or his own one that is
> easy to parse.


Indeed. There are very useful config modules out there. However, I'm
afraid we lost the OP a few plies ago

> But still, the problem presented here again makes me want a tie_deeply()
> mechanism for perl. There are situations (like this one), where tying
> the nested structures is impossible because it happens too late.


I'm not sure what deep-tying would be (something like autovivification
to another tied object?).

> >> ...and die when duplicate keys within a sub-structure are detected.

> >
> > I am beginning to have doubts about that. I get the impression the
> > OP *wants* the duplicate keys.

>
> Well, first step would be s/die/warn/g. I'm too lazy now to modify
> my routines accordingly, but it shouldn't be too hard to turn the value
> of an already existing key/value pair into an array-ref and push the
> second value onto it.


I wouldn't invest too much in it either, for the moment.

However, it occurred to me that this problem may have didactical uses
for the introduction of Perl objects:`

Use the problem to show that it's desirable to make similar references
distinguishable, and introduce bless/ref for the purpose.

Show a solution that uses bless() to mark those lists that represent
multi-key hashes, but doesn't use method calls yet. There'll be
a few "if ( ref( ...) eq '...' ) {" in the code.

Then introduce method calls and show how they simplify the solution.
End up with an implementation of a class "Multi_key_hash" that solves
the original problem.

That way, objects can be introduced with good motivation for every
step.

Anno
 
Reply With Quote
 
Tassilo v. Parseval
Guest
Posts: n/a
 
      09-11-2003
Also sprach Anno Siegel:

> Tassilo v. Parseval <> wrote in comp.lang.perl.misc:


>> Gee. We are beginning to make up our own little meta-language now.
>> I think when things start to get that sophisticated, a complete new
>> approach is worth a look. Maybe the OP should use a real config-file
>> format. Either one already supported in the CPAN or his own one that is
>> easy to parse.

>
> Indeed. There are very useful config modules out there. However, I'm
> afraid we lost the OP a few plies ago


In my direct follow-up to him I warned him that a solution is
non-trivial.

>> But still, the problem presented here again makes me want a tie_deeply()
>> mechanism for perl. There are situations (like this one), where tying
>> the nested structures is impossible because it happens too late.

>
> I'm not sure what deep-tying would be (something like autovivification
> to another tied object?).


Yes, though I wouldn't call it autovivification. I didn't yet think how
the interface should look like, but this

%tied_hash = ( key1 => 'val1',
key2 => { subkey1 => 1,
subkey2 => 2 },
);

should not result in

(tied %tied_hash)->STORE('key1', 'val1');
STORE('key2', { subkey1 => 1, subkey2 => 2 });

but rather in

(tied %tied_hash)->STORE('key1', 'val1');
->STORE('key2', 'subkey1', 1);
->STORE('key2', 'subkey2', 2);

or so. And the above should happen recursively. It needs a proper
calling convention and all, then it would be very nice to have. It would
make a module such as MLDBM much more natural in use.

The idea however is not new. It's just tough finding volunteers to
implement it.

> I wouldn't invest too much in it either, for the moment.
>
> However, it occurred to me that this problem may have didactical uses
> for the introduction of Perl objects:`
>
> Use the problem to show that it's desirable to make similar references
> distinguishable, and introduce bless/ref for the purpose.
>
> Show a solution that uses bless() to mark those lists that represent
> multi-key hashes, but doesn't use method calls yet. There'll be
> a few "if ( ref( ...) eq '...' ) {" in the code.
>
> Then introduce method calls and show how they simplify the solution.
> End up with an implementation of a class "Multi_key_hash" that solves
> the original problem.
>
> That way, objects can be introduced with good motivation for every
> step.


Yes, it would be a more than welcome variation of the schemes that seem
to be preferred in those OO-tutorials, such as deriving pigs from
animals, letting them oink etc.

Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
hash of hash of hash of hash in c++ rp C++ 1 11-10-2011 04:45 PM
Hash#hash and detecting changes? Space Ship Traveller Ruby 3 03-21-2010 03:25 PM
hash.keys and hash.values Mage Ruby 14 08-15-2006 08:44 PM
Hash#values and Hash#keys order Alex Fenton Ruby 1 04-15-2006 05:45 AM
No Keys, nor other hash methods on multidimensional hash Xeno Campanoli Ruby 16 08-25-2005 07:04 AM



Advertisments