Tony Curtis <tony_curtis32@_SPAMTRAP_yahoo.com> wrote:
> >> On 16 Feb 2004 13:44:10 -0800,
> >> said:
>
> > The below simple code works at removing dups from a 20k
> > record file. Looking for somebody to explain how/why.
>
> It's not even close, I'm afraid.
Well, it solves the problem asked. Yes, it has problems, but...
> You'll probably want to chomp() the lines too, since the
> trailing newline sequence is usually part of the file
> representation, not part of the data content per se.
In this case it isn't necessary: the lines are being compared for
uniquness, so the line with the $/ on the end is just as good as
without. Think before you say things like this.
> > foreach $key (@lines){
> > $lines{$key} = 1;
> > }
> > @lines = keys(%lines);
> > print @lines;
>
> > I understand I am adding a key = 1 to every line (is it to
> > every line?), but when we recreate @lines what exactly is
>
> "Adding" is a misleading word here, implying that the value of
> the line is being changed. "Associating" would be closer.
Indeed. The important point, though, is that each key can only go into
the hash once.
> > keys(%lines) doing/saying? I see that %lines contains
> > 1+unique records in the file).
>
> Using a hash is the right choice here, but see
>
> perldoc -q duplicate
>
> Essentially you want to, for each line, output the line only
> if you haven't seen that same line before (i.e. it's not th
> key of a hash).
Yes, another WTDI would be to print the lines as you go along: this is
more parsimonious, and outputs the lines in the original order.
while (<F>) {
print unless $lines{$_};
$lines{$_} = 1;
}
This doesn't mean that the script as given is wrong, however.
Ben
--
$.=1;*g=sub{print@_};sub r($$\$){my($w,$x,$y)=@_;for(keys%$x){/main/&&next;*p=$
$x{$_};/(\w)::$/&&(r($w.$1,$x.$_,$y),next);$y eq\$p&&&g("$w$_")}};sub t{for(@_)
{$f&&($_||&g(" "));$f=1;r"","::",$_;$_&&&g(chr(0012))}};t #
$J::u:

:t, $a::n:

::t::h::e::r, $P::e::r::l, $h::a::c::k::e::r, $.